Skip to main content

Full text of "Naval research logistics quarterly"

See other formats

4k €? 

25 OCT 73% 




o _ 


VOL. 20, NO. 3 


Li-nJ-'Fj NAVSO P-1278 



F. D. Rigby 
Texas Tech. University 

B. J. McDonald 
Office of Naval Research 

O. Morgenstern 
New York University 

S. M. Selig 

Managing Editor 

Office of Naval Research 

Arlington, Va. 22217 


R. Bellman, RAND Corporation 

J. C. Busby, Jr., Captain, SC, USN (Retired) 

W. W. Cooper, Carnegie Mellon University 

J. G. Dean, Captain, SC, USN 

G. Dyer, Vice Admiral, USN (Retired) 

P. L. Folsom, Captain, USN (Retired) 

M. A. Geisler, RAND Corporation 

A. J. Hoffman, International Business 

Machines Corporation 
H. P. Jones, Commander, SC, USN (Retired) 
S. Karlin, Stanford University 
H. W. Kuhn, Princeton University 
J. Laderman, Office of Naval Research 
R. J. Lundegard, Office of Naval Research 
W. H. Marlow, The George Washington University 
R. E. McShane, Vice Admiral, USN (Retired) 
W. F. Millson, Captain, SC, USN 
H. D. Moore, Captain, SC, USN (Retired) 

M. I. Rosenberg, Captain, USN (Retired) 

D. Rosenblatt, National Bureau of Standards 

J. V. Rosapepe, Commander, SC, USN (Retired) 
T. L. Saaty, University of Pennsylvania 

E. K. Scofield, Captain, SC, USN (Retired) 
M. W. Shelly, University of Kansas 

J. R. Simpson, Office of Naval Research 
J. S. Skoczylas, Colonel, USMC 
S. R. Smith, Naval Research Laboratory 
H. Solomon, The George Washington University 
I. Stakgold, Northwestern University 
E. D. Stanley, Jr., Rear Admiral, USN (Retired) 
C. Stein, Jr., Captain, SC, USN (Retired) 
R. M. Thrall, Rice University 
T. C. Varley, Office of Naval Research 
J. F. Tynan, Commander, SC, USN (Retired) 
J. D. Wilkes, Department of Defense 

The Naval Research Logistics Quarterly is devoted to the dissemination of scientific information in logistics and 
will publish research and expository papers, including those in certain areas of mathematics, statistics, and economics, 
relevant to the over-all effort to improve the efficiency and effectiveness of logistics operations. 

Information for Contributors is indicated on inside back cover. 

The Naval Research Logistics Quarterly is published by the Office of Naval Research in the months of March, June, 
September, and December and can be purchased from the Superintendent of Documents, U.S. Government Printing 
Office, Washington, D.C. 20402. Subscription Price: $10.00 a year in the U.S. and Canada, $12.50 elsewhere. Cost of 
individual issues may be obtained from the Superintendent of Documents. 

The views and opinions expressed in this quarterly are those of the authors and not necessarily those of the Office 

of Naval Research. 

Issuance of this periodical approved in accordance with Department of the Navy Publications and Printing Regulations, 


Permission has been granted to use the copyrighted material appearing in this publication. 




S. Zacks and W. J. Fenske 

Department of Mathematics and Statistics 

Case Western Reserve University 

Cleveland, Ohio 


The problem of determining the optimal inspection epoch is studied for reliability 
systems in which N components operate in parallel. Lifetime distribution is arbitrary, but 
known. The optimization is carried with respect to two cost factors: the cost of inspecting 
a component and the cost of failure. The inspection epochs are determined so that the 
expected cost of the whole system per time unit per cycle will be minimized. The optimiza- 
tion process depends in the general case on the whole failure history of the system. This 
dependence is characterized. The cases of Weibull lifetime distributions are elaborated 
and illustrated numerically. The characteristics of the optimal inspection intervals are 
studied theoretically. 


In the present study we investigate the problem of determining the optimal inspection epochs 
of a reliability system which is comprised of N components, operating independently (in parallel) 
and having the same lifetime distributions. The lifetime distribution is known. An inspector visits 
the system at a predetermined inspection epoch and finds a certain number of components which 
have failed. The exact times of failure are unknown. All the components which have failed during 
the interval between inspections are replaced by new components. Components which have not failed 
are left in the system. We consider two types of cost factors: (i) The cost of inspection, which depends 
on the number of components in the system; and (ii) The cost of failure per unit time. This cost com- 
ponent measures the loss due to a failure of components. The objective is to determine an inspection 
policy that would be optimal with respect to the criterion of minimizing the total expected (discounted) 
cost for the entire future. However, since we are dealing with cases of general lifetime distributions 
(not necessarily exponential) the dynamic programming solution is excessively complicated, even 
in the truncated case (when the number of inspections should not exceed a prescribed bound). There- 
fore, we are considering in the present paper a sequential myopic procedure. Accordingly, after each 
inspection the epoch of the next inspection is determined, as a function of the whole past failure 
history of the system. The aim is to minimize the conditional expected cost per time unit from the 
present time until the next inspection epoch. In the case of exponential lifetime distributions (constant 
failure rates) the optimal inspection interval (time interval between inspections) does not depend 
on the past history of the system. As shown in the present study if the lifetime distribution is not 
exponential this dependence might be very strong, especially if N is not large and the lifetime distribu- 



tion is of a decreasing failure rate (DFR). The dependence of the optimal inspection intervals on the 
observed number of failures, and on the number of components that were replaced at previous in- 
spections and are still operating, will be explicitly characterized. We start in section 2 by formulating 
the model and the associated distributions. In section 3 we develop a general formula for the sequential 
determination of the optimal length of the inspection intervals. In section 4 we derive the corresponding 
formulas for lifetime distributions of the Weibull family; and illustrate the process with a numerical 
example. In section 5 we try to explain the complex process illustrated in the example of section 4 
by further theoretical development. 

There are numerous papers in the reliability literature on inspections epochs and optimal mainte- 
nance. For the general theory see chapter 4 of Barlow and Proschan [1]. Articles which are close to the 
present study are those of Kamins [4], Kander [5], Kander and Naor [6] and Kander and Rabinovitch [7]. 
The present study provides further elaboration of a chapter in the thesis of Fenske [3]. The main dif- 
ference between the present study and the articles mentioned above is in the basic model. The present 
study is concerned with multicomponent systems while the other studies treat the whole system as one 
component. The study of Ehrenfeld [2] was based on a model similar to ours, but Ehrenfeld considered 
the problem of determining the inspection interval for the estimation of the mean time between failures 
in the exponential case. 


Consider a reliability system which consists of N, N 3= 1, components. These components operate 
independently (in parallel). Let T designate the lifetime of a component. This is a random variable 
having a known distribution function (c.d.f)F (t). We assume that F{t) is absolutely continuous, with a 
positive density function f(t), < f(t) < 0, and F(0) = 0. We further assume that the expected value 
of T, according to F(t) is finite. Let So = and let So < Si < S 2 < . . . < S m < • ■ ■ designate a 
sequence of inspection epochs. Let J m (m = 1, 2, . . .) designate the number of components that failed 
during the time interval (S m -i, S,„). All the J m components are replaced at the inspection epoch S m . 
The N — J m components which have not failed during (S m -i, S,„) are classified into m disjoint subsets 
Ah m) ,A[ m) , . . . ,A m "L\ • The subset A^ n) = 0, . . . ,m — 1) contains all the components that were replac- 
ed at epoch Sj and did not fail throughout the time interval (Sj, S,„). Let /^ m) designate the number of 

elements of A ( f. Obviously, A*>"y l) C <4 ( "°and ra ( "] +1) *£ n l f for each; = 0, 1,. . . and m = j,j + 1 

Letra < ^ ) = y m ,andn (m) = (rt ( ™ ) ,n ( 7 ) , . . .,n ( ™>) for each m = 0, 1, . . .;n ( °> = n. 

If a component belongs to the subset A^'p then its conditional lifetime distribution at time t is: 

(2.1) F<;<> (t) = P{T ^ t - Sj I T > S m - Sj} 

r 0, lit ss S,„ 

= F(t-Sj)-F(S m -Sj) 

l-F(S m -S } ) 

it >S, 

In particular, F\^(t) — F(t — S m ), if t > S m and zero otherwise. The conditional densities of 
U =T — (S m — Sj), corresponding to the life time T of a component which belongs to A ( "^ play an 
important role in our procedure. We can call U the remaining lifetime. It a component is chosen at 


random at time t = S, n + its remaining lifetime U has a conditional density function 

MulS^n^^^S/r r-^I-5^ " 

where S (M > = (S„. . .,S m ). 

We notice that if Z" has a negative exponential distribution, i. e. ,f(t) — Ke~ xt , f3*0, for any 0<\<«>, 
then A,„(«|S (W) , n<"'») = /(u) for all m = 1, 2, . . . and a// (S (m >, n< w >). This is a well known property 
of the negative exponential distributions. Let //,„(u|S (m) , n ( ' w) ) designate the c.d.f. corresponding 
to (2.2). 


We will consider in the present section the problem of deriving an inspection policy which attains a 
certain economic objective. We assume therefore that the cost of inspecting the system is $C per inspec- 
tion, and on the other hand, if an element fails then the cost associated with its failure is $C/per time 
unit. The inspection policy adopted here is the following. Given the history connected with the past m 
inspection intervals, i.e., (S (w) , n (m) ) determine the (m + 1) st inspection epoch so that the average 
expected cost per time unit of inspection and of failure, over the (m+1) st inspection interval will 
be minimized. We remark in this connection that this policy is in essence a myopic policy, which 
minimizes the expected time average-costs for each inspection interval individually. A Dynamic 
Programming determination of the inspection epochs could attain a more global optimization. How- 
ever, attempts at Dynamic Programming solutions lead to complicated sets of recursive functional 
equations. The solution of these equations is generally very tedious. As will be shown later the suggested 
myopic procedure is globally optimal if the lifetime distribution is exponential. In other cases of interest, 
like the Weibull distributions the myopic procedure does not coincide with the global Dynamic Pro- 
gramming solution. A study of the relative efficiency of the myopic procedure is still under way. 

Let A designate the length of the (m 4- 1) st inspection interval. That is, A = S m +i — S m . Given 
(S <m) , n (m) ) the conditional expected average cost per time unit, under A, is 

(3.1) «„<* s<»>, o<«>) = §+ £| »<-> /; (a - u) f ( _"; ( s;;is;S *■ 

Or in terms of the conditional distribution of the remaining lifetime U we can express (3.1) in the form, 

(3.2) R,»(A; S< m >, n<'"») = 4°+ NC f H m (A\S^\ n< m >) - -% f N f uh m (u\^'"\ n"">)rf«. 

A A Jo 

The optimal (m+1) st inspection epoch is defined as'S m+ i = S m + A , where A is a positive real value, 
A, for which the infimum of (3.2) is attained. 

(3.3) fJim = y «/i m (u|S"' , >, n {m) )du, 


be the expected remaining life, given (S (m) , n (w) )- According to the assumption of the previous section, 
fx,„ < oo. Differentiating R,„( A; S ( '">, n'"' 1 ) with respect to A we obtain that if /a,„ =£ Co/NC/then A = ». 
This is a case in which no more inspections are warranted. On the other hand, if ix m > Co/NCf, there 
exists a unique solution, A , to the equation: 

(3.4) P uM«|S ( '"\ n"">)du = ColNC f . 

We realize from (3.4) that S, n +i is a function of the statistic (S'"'*, n (m) ) of the system. 

As we have already mentioned in case's of exponential lifetime distributions the optimal length 
of the inspection intervals is the same for all m = 1, 2, . . .. If 6= X" 1 is the mean time between failures 
(MTBF) in the exponential case then /t,„ = 6 for all m, and the condition for a finite A is that Co < NC/6; 
i.e., the cost of inspecting an element is smaller than the expected cost of failure of an element. If 
this condition is satisfied then by letting y = Co/NC/O, it is easy to show that 

(3.5) A° = | x ;[4], 

where Xy [4] designate the y-fractile of a chi-square distribution with 4 degrees of freedom. 


Suppose that the lifetime of an element, T, follows a Weibull distribution, with a density function 

(4.1) f(t; 6, a) = ( 

0, if t =£ 

^r- 1 exp{-*"/0}, if t >0, 


where a and 6 are positive real parameters. We notice that if < a < 1 then the distribution has a 
decreasing failure rate (DFR), and if 1 < a < oo its failure rate is increasing (IFR). When a = 1 the dis- 
tribution is exponential. Given (S ( ' H) , n (m) ) the density function of the remaining life U assumes the 
special form 


1 m a 

F«M"|S<»",n<»»»)= ^2n^ e xp{(Sm-Sj)"ld} ■ - d (u + S m -Sj)"-i exp{- (a + S m -Sj)"IB} , 


for s£ u *s oo. When m = (4.2) reduces to (4.1). Following the procedure given in the previous section 
we realize that Si < oo if, and only if, 

(4.3) C o /A/<C / 0'/«rQ+l) 

0i/a Y( — |-l) is the expected lifetime. If (4.3) is satisfied then the optimal value of Si is 




where G ' (y \ p, v), is the y-fractile of the Gamma distribution G(p, ;<). with scale parameter p, and 
where y = ClyNCf* 1 " f( - + lj). We notice that if 2/a is a positive integer; then 


s, = {fxU 2 +•>']} 

We determine now a general expression for the left-hand side of (3.4). According to (4.2), 

rA i m 

(4.6) u.h% o) (u|S< m >, n< m >) du = -fi £ ^ m) exp { (S m -Sj) a l6}. 

y I u(« + S m Sj) a ' exp {- (tt + Sm-Sj) /^} rf«. 

By a proper change of variable, we obtain 

(4.7) yf u(« + S ra -S j ) a -'exp {-(u + S„ i -S j }«/0}^ 

As -s +A)»/e / 1 \ 

= '" I [B^w^-iSm-Sj)] exp {-u;} rfw=0" u r — +1 

J(S m -Sjfl B \« / 


— (S m — Sj) exp I- 



(5,,,-Sj + A)"] 

Substituting (4.7) into (4.6), we obtain that S,„+i=Sm + A, where A is the root of the equation: 

< 4 - 8 ) ji 1 »r ^p { (s m -Sj)"ie} \g ( Sm "^~ A)a ; 1, ^+ 1) 

j=o L \ 



[l-exp {-- ±- [(S m -S J +A)«-(S«-S J )«]} 

y is as before, and G{x; p, v) is the c.d.f. of G(p, v) at x. 


We notice that for m = the solution of (4.8) is reduced to the one given by (4.4). In figure 1 we 
illustrate the solution of (4.8) for three Weibull distributions, where the nj m) sequences were generated 
by Monte Carlo simulation. The cases under consideration have the following parameters: C/=$10., 
C o = $200 • N, 0=[hr] 100, and a = 3/4, 1, and 5/4. The case of a= 1 corresponds to the exponential 
distribution with mean 0=100. According to (3.5) the optimal inspection interval for a= 1 is of length 
[hr] 50 x\ [4] where y = Col W • C f = 0.2. One can find in any statistical tables that x 2 2 [4] = 1.65. 
Hence, the optimal interval between inspections in the exponential case is of length 82.5 hours. The 
case of a = 5/4 represents an IFR distribution. We see in Figure 1 that the optimal inspection intervals 
are of length which vary very little around 59 hours. It is interesting to notice that in the present case of 
an IFR distribution the optimal inspection intervals do not depend strongly on the number of components, 
N, in the system. This is not the case when the Weibull distribution is a DFR (a = 3/4). As illustrated 
in Figure 1 the optimal intervals for DFR distributions, as obtained from (4.8), are sensitive to N. When 
N=10 there are considerable fluctuations of the solution of (4.8). When N = 100 these fluctuations di- 
minish. The general trend of growth in the length of the inspection intervals is, however, the same. 
An explanation of this phenomenon will be provided in the next section. Finally we remark that the 

a= 3/4 

5: 80 


75 - 



a = 5/4 

5 10 15 20 25 30 35 40 45 50 


Figure 1. Optimum inspection intervals for Weibull Distribution with C/=$10,C„ = $200N, and = 100 (hr) 


numerical solution of Equation (4.8) in the case discussed here has been attained following the Newton- 
Raphson iterative corrections to an initial solution. For further details see Fenske [3]. 


A characterization of the solution obtained from (3.4) is not a simple matter, since the inspection 
epochs S-2, S3, . . . are random variables depending on the random vectors n 1 '"' in quite a complicated 
manner. We remark that the sequences {n{ m) : m—j, 7+1, • • •} are supermartingales, for each 
j=0, 1,2,.. .; and Urn rcj" exists as shown. Furthermore, if S m _i— S m 3= A for every m then lim 

m— *<» m— kc 

n (m) = o f or eacn y. This property holds for any life time distribution F. In order to obtain certain theo- 
retical approximations to the distributions of the roots of (3.4) we consider a modified problem, in which 
for each m = 0, 1, . . . the random variables (n[ m \ j=0, . . . , m) are replaced by some fixed non- 
negative values. More specifically consider the distribution of the random variable. 

1 '" f A 

(5.1) W^jj^nMll-FiSm-Sj)]- 1 ] uf(u + S m -Sj)du, 

in which the inspection epochs are predetermined fixed values. W m is the left-hand side of Equation 
(3.4). Whenever Si, S 2 , . . . are fixed inspection epochs the vector (/ij ( m) , n (m) , . . ., n {m) ) has for each 
m=l, 2, ... a multinomial distribution, with parameters N and (fl*'"'; ; = 0, . . ., m), where ^' n) 


is the probability that an element belongs to /4< m) , ^ &.' n) = 1. The probabilities fr. m) can be determined 


recursively according to the following formulae: 

(5.2) 8?»=l-F(S m ) 



! = 

0(m)=h-^dUA[l-F(S„-Sj)],j=l, . . .,m. 

It follows that for any fixed sequence of inspection epochs and for each m = 0, 1, . . . 

(5.3) ;'=0, 1, . . .,m 


(5.4) cov (nj" , 4 m >) = -N^ n) e^"\ a\\0^ j<k^m. 

From (5.2) and (5.3) we conclude that if the length of each inspection interval is not smaller than A then 
for any distribution F, lim n ( . m) — for each). 


The variable W m is a linear combination of multinomial random variables. Its expectation is 

(5.5) u>„, = E{W m } = D^ + 2 1-£0U> DW, 

j=i L ,'_n -J 


(5.6) Z)('»> = j uf (u + S„, - Sj) rfa. 
The variance of JF,,, is 

_i f - 0j m) (0j m) ) 2 / . ^'"W"') V] 

(5.7) Var{r '" } "iV||; ) [l-F(S„ 1 -S J )]^^5l-F(S m -S j )j]' 

We have shown that for any sequence of inspection epochs Var {W m } = 0(N~ 1 ) as N — » ». This explains 
why the fluctuations of the roots of (4.8) are relatively large when N= 10 and small when N = 100. We 
consider now a particular sequence of inspection epochs which consists of values of S m obtained by the 
repeated solution (for each m) of the equation (o m = y, i.e., 

r& in r j-\ n rA 

(5.8) uf(u)du+\ l-V0/i> uf(u + S m -Sj)du = y. 
h j=i L i= JJ( » 

Si is the root A of I uf(u)du = y, and for each m= 1, 2, . . ., the (m+ l)st inspection epoch is given 


by Sm+i = Sm + A. The sequence of fixed inspection epochs determined by this procedure corresponds 
to the expected values of n im) and we therefore label this procedure as the Procedure Of Averages. In 
Table 1 we provide the inspection intervals determined by the Procedure of Averages, and the corre- 
sponding multinomial probabilities 0*." 1 * (/— »' • • •' m i) ■< f° r the two cases represented in Figure 1. The 
graph of the corresponding inspection intervals for the case of a = 3/4 (DFR) is also plotted in Figure 1. 
As is demonstrated in Table 1, in the IFR case (a = 5/4) the significant contribution to the solution is 
expected to be that of n^"' and n ( ^"J,, or of their corresponding expected values. Furthermore, the 
optimal length of the inspection intervals varies very little with the number of inspections, m, and its 
expectation reaches in the present example a stable situation after two inspections. This is not the 
case, however, in the DFR distribution (a = 3/4). The probabilities "' approach zero, as m grows, very 
slowly. This is reflected in a steady increase in the length of the inspection intervals as m grows, and a 
stable situation is reached in the present example only after 10 inspections. 

To insure that the inspection intervals discussed in sections 3 and 4 will have similar properties 
to those determined by procedures of fixed inspection epochs we could consider the following adjust- 
ment. First, determine for each m—1, 2, . . . two fixed sequences of inspection epochs which will 
constitute upper and lower (confidence) limits for the solution of (3.4) (or (4.8)). This can be done by 
utilizing formulae (5.5) and (5.7). The lower confidence limits could be obtained by repeated solution 
(for the root A) of the equation 


Table 1. Values of optimal inspection intervals A [hr] and multimonial probabilities under the 
Procedure Of Averages for Weibull distributions with 6=100 [hr] and cost components C — 
$200N, C f = $10 

CaseI:a = 5/4(IFR) 


opt. A 




7 = 3 

7 = 4 

7 = 5 

7 = 6 


7 = 8 

7 = 9 






































Case II: a = 3/4 (DFR) 























































































a>„, + 3. [War {W m }] m = T, m=l,2, . . .. 

The upper limit can be obtained by solving the equation 


o> m -3. [Var{r m }]" 2 = y, ro=l,2 

In the second phase of computation solve Equation (3.4). If the solution lies between the roots 
of (5.9) and (5.10) proceed; otherwise truncate the solution to either the lower limit or to the upper limit, 
whichever is closer to the actual solution. Such an adjustment will guarantee that every inspection 
interval will be bounded by lower and upper values which are determined by fixed sequences of in- 
spection epochs, and will therefore have general characteristics as established here. 


[1] Barlow, R. E. and F. Proschan, Mathematical Theory of Reliability (John Wiley and Sons, New 

York, 1967). 
[2] Ehrenfeld, S. "Some Experimental Design Problems in Attribute Life Testing," J. Am. Stat. Assoc, 

[3] Fenske, W. J., "Optimal Inspection Epochs For Reliability Studies" Ph.D. Dissertation, Department 

of Mathematics and Statistics, Case Western University (1972). 


[4] Kamins, M., "Determining Checkout Intervals for Systems Subject to Random Failures," The Rand 
Corporation, Paper RM-2578 (1960). 

[5] Kander, Z., "Inspection Policies of Deteriorating Equipment Characterized by N Quality Levels," 
Technion-Israel Institute of Technology, Operation Research Monograph No. 93 (1971). 

[6] Kander, Z. and P. Naor, "Optimization of Inspection Policies by Dynamic Programming, Technion- 
Israel Institute of Technology, Operation Research Monograph No. 61 (1970). 

[7] Kander, Z. and A. Rabinovitch, Maintenance Policies When Failure Distribution of Equipment is 
Only Partially Known, Technion-Israel Institute of Technology, Operations Research Monograph 
No. 92 (1972). 




G. Kemble Bennett 

Virginia Polytechnic Institute and State University 


H. F. Martz 

Texas Tech University 


An empirical Bayes estimator is given for the scale parameter in the two-parameter 
Weibull distribution. The scale parameter is assumed to vary randomly throughout a se- 
quence of experiments according to a common, but unknown, prior distribution. The shape 
parameter is assumed 'to be known, however, it may be different in each experiment. The 
estimator is obtained by means of a continuous approximation to the unknown prior density 
function. Results from Monte Carlo simulation are reported which show that the estimator 
has smaller mean-squared errors than the usual maximum-likelihood estimator. 


A large number of authors have considered estimation of the parameters of the Weibull distribution 
by the method of maximum likelihood, the method of moments, and numerous other classical tech- 
niques. Frequently, however, the parameters of the Weibull distribution are subject to random variation 
and an analysis which encompasses this feature is best suited. Such an analysis has been performed 
by Soland for the cases where the scale parameter is treated as a random variable [5] and where both 
the shape and scale parameters are treated as random variables [6]. These approaches exhibit a Bayesian 
viewpoint as adequately described by Raiffa and Schlaifer [4]. Emphasis is placed on determining con- 
jugate prior distributions and on performing both terminal and preposterior analysis. In this paper we 
obtain empirical Bayes estimates for the scale parameter. This approach, like the Bayesian approach, 
allows for the assumption of a randomly varying scale parameter. The analysis, however, does not 
require any specific assumptions as to the distributional form of this parameter. Since this distribution 
generally remains unknown, the empirical Bayes approach can, in a large majority of cases, be success- 
fully applied. Application would certainly be warranted, for example, in reliability life testing situations 
where the lifetime distribution of items subjected to routine testing is adequately described by a Weibull 
disbribution but where the scale parameter varied from test-to-test. 


Consider the situation in which we observe a value t (which may be vector valued) from a Weibull 
density function given by 

(1) /(t|A) = A#0-> *-*'*' 



and must estimate tiie parameter X with small squared-error. The shape parameter, /3, is assumed to 
be known, however, the scale parameter, X, which determines t, is itself assumed to be a realization of 
an unobservable random variable. Furthermore, it is assumed that this estimation process is a routinely 
reoccurring situation. Therefore, as the process is repeated we obtain a sequence of realizations of 
independent and identically distributed random variables t\, r 2 , . . ., t n - Our problem is to determine 
an estimator X„ = X„Ui, . . ., t„) which minimizes E(k„ — k n ) 2 , where the expectation is taken with 
respect to all the random variables involved and where X„ is the rath or current realization of X. Since k 
is itself a random variable this minimizing estimator is well known to be the Bayes estimator, the mean 
of the posterior distribution. This estimator can be represented by 


(2) MM*)- J/(,|A)s(X)dX' 

where g(k) is the true prior density function of X. 

Since the prior density usually remains unknown in practice, the estimator, E{k\t), cannot be 
exactly determined. It can, however, be approximated using the information, t x , t 2 , . . ., t„, obtained 
from previous realizations of X. Such an estimator is commonly referred to as an empirical Bayes esti- 
mator. For a detailed discussion of the empirical Bayes approach the reader is referred to, for example, 
Maritz [2]. 

To illustrate this situation, consider a repetitive testing program in which the time-to-failure density 
of tested items is given by (1). During each test a sample of k failure times is observed from (1) and an 
estimate of X is to be given. For example, at the first test a sample of k failure times is recorded and an 
estimate of Xi is required. At the second test an additional random sample of k failure times is obtained 
and an estimate of k>, which may be different from X x , is required. This situation continues until the 
present or rath test is completed, at which time an estimate of X„ is to be given. Due to changing en- 
vironmental conditions from test to test, imperfect testing equipment, interactions of population 
components, etc., the values Xi, X^, . . ., X„ are not likely to be equal, but to vary unpredictably and 
thus randomly. This variation can therefore be described by a prior density function. However, since 
the values of X remain unknown specification of g(k) can often be risky. In the situation described here 
the observed experimental data, t t . t>, . . . , t„, are used to approximate g(k) , thereby relieving the ex- 
perimenter of the task of specifying the form of g(k). 


Suppose that at each experiment j comprising a testing program a random sample from (1) is taken 
and a maximum likelihood estimate 

(3) **=*/£ *1 


formed. Then the sequence of estimates 

(4) Xfc, i, Xfc,' 2 , • • ., Xfc.n 

provides a source of information on the past behavior of X. Based on this sequence of linear transforma- 


tion can be performed which yields a new sequence of values 


j , A 2 , . . ., A„ , 

which when considered collectively have a mean and variance approximating those of the random 
parameter A. This sequence can now be used to approximate the prior density function, g( A). Proper 
substitution of the approximation into (2) will yield an empirical Bayes estimate for A„, the nth or present 
realization of A.. 

The particular density approximation chosen is described by Parzen [3]. He presents a consistent 
density estimator of the form 


S" (x) =7sWi r (w)' 

where W{-) is a weighting function satisfying certain boundedness and regularity conditions, and 
h ( n ) a smoothing constant so chosen that lim h ( n ) = and lim nh ( n ) = °° These restrictions are placed 
on h(n) to assure the consistency of g n (k). Parzen also lists several possible representations for 

»W{ • ). Using these results Bennett and Martz [1] suggest the replacement of each unobservable A, in 
(6) by its corresponding transformed estimate kf and form the density approximation 

Subsequent substitution of (7) into (2) yields the empirical Bayes estimator 

m e (Kit)- lA£Mh)£n{h)d±. 

The maximum likelihood estimator, A*, given by (3) is well known to be both consistent and 
sufficient for estimating A and can be easily shown to have the conditional density function 

(9) f(\k\^) = [kklk k ] k+1 exp[-k\l\ k ]ir(k)\k 
with mean and variance given by 

(10) E(k,c\k) = kkl(k-1) andVar (k k \k)= (kk) 2 l(k-l) 2 (k-2), 

respectively. Since the maximum likelihood estimator is sufficient for estimating A the Bayes estimator 
E{k\t), as defined by (2) can be conveniently written as £"(X |X>t) •, and the corresponding empirical 
Bayes estimator becomes 

(11) E„(k\k k )= 7 7 , 

ff(k k \k)g n (k)dk 

where /(Afr| A) and£„(A) are given by (9) and (7), respectively. 


Since the actual range of A remains unknown the region of integration in (8) and (11) must be 
determined empirically. This can be satisfactorily resolved by taking the region of integration to be the 
observed range of estimates upon which the prior density approximation is based. Thus, it is only 
necessary to order successively the estimates Ai, k 2 , . . . , A,, to obtain the region of integration. Alter- 
natively, the positive half of the real line could be used. 

Let us now consider the linear transformation of k k defined by 

(12) k* = C t ' 2 [k k -E(k k )} + E(k), 

(13) C = Var (A) /Var (A*). 

The mean and variance of A* are easily verified to be equivalent to those of the random variable A, e.g., 
E(k*) = E(k) and Var (A*) = Var (A). If the mean and variance of A were known then the transforma- 
tion could be applied to each of the maximum likelihood estimates of sequence (4) forming sequence (5). 
Since the mean and variance of A generally remain unknown, estimates of these quantities are required. 
Using relationships of conditional probability, we have from (10) that 

(14) E(k k ) = E[E(k k | A)] = [kl(k - l)]E(k) 
and that 

(15) Var (A*) = Var [E(k k | A)] + £[Var (\ k | A)] 

= [*/(*-l)] 2 Var (A) + [kl(k-l) 2 (k-2)]E 2 (k). 
From (14) the prior mean can be consistently estimated by 

(16) E(k) = [(k-l)lk] An, 

i - 1 -^ * 

where A„ denotes the sample mean - V k k , ,. If in (15) Var (k k ) is replaced by the sample variance, 

n A 

S„ = ]£ (A*,,-A n )2/n, E 2 (k) is replaced by the relation, E(k 2 ) =E 2 (k) + Var (A), and E(k) is re- 

placed by A„, then upon solving for VAR (A ) , we obtain 

(17) Vm (k) = [(k-2)Sl-kf,] (k-l)lk* 

as an estimate of Var (A). Proper substitution of the above results into (12) yields 

(18) k* = Oi*[k k -k n ] + [(k-l)lk]k n , 


(19) C ={k-\)lk[(k-2)-KISl}. 

Thus, the transformation defined by (12) is completely determined and the empirical Bayes estimator 
given by (11) can be formed. 


To ascertain the usefulness of the empirical Bayes estimator, E„(\\\k), Monte Carlo simulation 
was employed on a UNIVAC 1108 computer. The criterion of comparison chosen was mean-squared 
error, and the widely utilized maximum likelihood estimator was the measurement reference. There- 
fore, the ratio 

empirical Bayes mean-squared error 

(20) R = : ' 

maximum likelihood mean-squared error 

was of interest. 

In the simulation, a value of X was randomly generated from a chosen prior density function 
selected from the Pearson family of distributions [7]. Then a random sample ti, t 2 , . . ., r* of size k, 
corresponding to the realization X was obtained from (1). The maximum likelihood estimate Xa- was 
then computed from (3) and its squared deviation (X — Xa-) 2 from the corresponding realization of 
X was calculated. For the second experiment, a new value of X was generated and the process repeated, 
obtaining Xa and its squared deviation. For this experiment, Zs2(X|Xa) and its squared deviation, 
[X — £" 2 ( X | Xa- ) ] 2 , from the corresponding realization of X were also calculated. This was repeated 20 
times, and each time, Z?„(X|Xa-) was calculated using the present Xa, « as well as all previous maximum- 
likelihood estimates. Five hundred repetitions of this run of 20 experiments were then made, and the 
averages of the squared deviations of Xa and E,,{^\^k) were formed as estimates of E(k — Xa) 2 and 
E[k — £" ( X | Xa- ) ] 2 , respectively. Then the ratio R was calculated utilizing these estimated mean-squared 
errors. All numerical integrations were performed by means of the 11-point Gauss quadrature formula 
and the weighting function W {•) in (7) was taken to be 

r ( r) = [^]', 


F=(X-X*)/2/i(") and h(n) = n- 1 ' 5 . 

This procedure was repeated for all types of Pearson prior distributions and the ratio R was 
observed to be significantly influenced by the prior distribution only through the ratio of the conditional 
variance of the maximum likelihood estimator to the prior variance of X. This value can be represented 

(21) Z= k2EHk) 

(*-l) 2 (k-2) Var (X)' 

where E(\) , the prior mean of X, has been substituted for X. Since the only factors affecting the ratio 
R, apart from the number of experiences, are contained in (21) this quantity can be conveniently used to 
summarize and index a given situation. 



It was generally observed in the simulation that the ratio/? varied only slightly for a given value of n, 
providing the value of Z remained invariant from distribution to distribution. These results indicate 
the robustness of the smooth empirical Bayes estimator to the form of the prior distribution. Also, it 
was observed that as Z increases, the values of the ratio R decrease. This phenomenon is best understood 
by considering the summary quantity Z. If Var (\a- | M is large as compared to Var (A) , then the maxi- 
mum likelihood estimate of A will vary widely. The empirical Bayes estimator, however, is capable of 
detecting this variation and can use this information to obtain better estimates of A. Conversely if Var 
(A* | A) is small as compared to Var (A), then the maximum likelihood estimator would be expected to 
do quite well. In this case there is a great deal of information within an experiment, and previous 
experiments contribute very little information about the parameter. This improvement, however, never 
surpasses that achieved by the empirical Bayes estimator. 

Values of/? are plotted, in figure 1, as a function of n, the number of past experiences, for different 
values of Z ranging from 0.5 to 5.0. For ease of presentation curves have been smoothed through the 
actual data points. 



- 1 


- 1 


- \\ 


~~— - Z = l.0v 


Z =05 



_ Z =25 

_ Z = 5.0 




i i 

1 1 1 1 

i i i 

6 8 10 12 14 





FIGURE 1. Ratio of the average squared-error of E„(\\ kit) to the average squared-error of \a for several values of Z. 

Figure 2 illustrates a typical comparison between the improvements realized from using the linear 
transformation defined by (18) and by not incorporating this feature into the analysis. The dotted line 
represents the ratio R formed with an empirical Bayes estimator whose prior density approximation 
is directly based on the sequence of maximum likelihood estimates (4). The solid line represents the 
ratio R formed with the empirical Bayes estimator as defined by (11). Here the prior density approxi- 
mation is based on the transformed sequence given in (5). Both ratios illustrate the improvements 
over the maximum likelihood estimator achieved by both empirical Bayes estimators. Note, however, 
that the solid line is significantly lower than the dotted line. This result was repeatedly reproduced 
in the simulation for all values of Z considered. 




















^~~~- __ 



1 1 1 1 1 1 1 1 1 

6 8 10 12 14 




Figure 2. Typical comparison of the ratio R with and without the linear transformation on Ki,; (- 


-) with, ( ) without. 


[1] Bennett, G. K., and H. F. Martz, "A Continuous Empirical Bayes Smoothing Technique," Biometrika, 

[2] Maritz, J. S., Empirical Bayes Methods (Methuen and Co., Ltd., London, England, 1970). 

[3] Parzen, E., "On Estimation of a Probability Density Function and Mode," Ann. Math. Statist. 

[4] Raiffa, H. and R. Schlaifer, Applied Statistical Decision Theory (Harvard Graduate School of Busi- 
ness Administration, 1961). 

[5] Soland, R. M., "Bayesian Analysis of the Weibull Process With Unknown Scale Parameter and Its 
Application to Acceptance Sampling," IEEE Trans. Reliability R-l 7, 84-90 (1968). 

[6] Soland, R. M., "Bayesian Analysis of the Weibull Process With Unknown Scale and Shape Param- 
eters," IEEE Trans. Reliability R-18, 181-184 (1969). 

[7] Thomas, D. G., "Computer Methods for Generating Pseudo-Random Numbers from Pearson Dis- 
tributions and Mixtures of Pearson and Uniform Distributions," Unpublished Master of Science 
Thesis, Virginia Polytechnic Institute and State University (1966). 


Claude G. Henin 

Faculty of Management Sciences 

University of Ottawa 



In the present paper, we solve the following problem: Determine the optimum redun- 
dancy level to maximize the expected profit of a system bringing constant returns over a 

time period T; i.e., maximize the expression P I Rdt — C, where P is the return of the 


system per unit of time, R the reliability of this system, C its cost, and T the period for 
which the system is supposed to work. 

We present theoretical results so as to permit the application of a branch and bound 
algorithm to solve the problem. We also define the notion of consistency, thereby determin- 
ing the distinction of two cases and the simplification of the algorithm for one of them. 


In [4] we described different methods for solving the following problem. A serial system made of 
n independent stages, has a reliability of R and a cost of C, for a mission of a certain duration. If the 
system functions throughout the whole mission, the resultant revenue is P dollars. The problem there- 
fore was to maximize the expression PR — C, where R and C are increasing functions of the number 
of standby units at each stage. 

In this paper, we will consider a similar but more practical problem: we will suppose that, when 
working, the system produces certain revenue per unit of time. This seems to be more likely to happen 
in real life problems. For example, let us consider the orbiting of a commercial satellite or the place- 
ment of a submarine cable, which are sources of continual revenue as long as they function properly. 
In both cases, the reliability of the system can be increased significantly through redundancy before 
the system begins operations; however the system can only be repaired with difficulty once it fails. 
The reliability of the system is equal to the product of the reliabilities, Ri, of each stage i. At each 
stage, we have nti components; i.e., one basic unit and (mj — 1) standbys. At each instant t* the relia- 
bility Ri is an increasing function of mj, the number of components. 

The cost of a stage is micu where c t is the acquisition cost of one component of type i, and the 
system returns a net revenue of P per unit of time, while functioning. The problem is to maximize 
the profit for a period T (where T can be infinite); i.e., to maximize the expression: 

[T n 

(1) P\ R(mi, ...,m»;t)dt— V cum, 

Jo i? x 


where R(m t , . . ., m n ; t) = FJ[ Ri(mu t). 

*The reliability Rt(t) of stage i is the probability that stage i is still working by time t. We neglect the influence of switching 
devices which could diminish the reliability of each stage n, if m becomes too large. 


396 C. G. HENIN 

The functions /?j(m,-, t) are non-increasing in t because the reliability of each stage cannot increase 
with time. 

Another problem arises if we suppose that the returns are discounted at a rate r. In this case, the 
problem becomes 


Ct n 

max P I e~ rt R(mi, . . ., m n \ t)dt — \ cm 

But it is clear that (2) is identical to (1) with the factor e~ rt included in R; i.e., if we replace R in (1) by 
R' — e~ r 'R. For computational reasons, it is easier to replace each /?,•(£ = 1, . . ., n) by R' = e~ rtln R'., 
which has the same effect as replacing R by /?'. With this last transformation the treatment of the two 
problems is identical and we shall restrict ourselves to the analysis of the first one. We shall analyze 
the properties of function (1) and indicate how this problem can be solved with the help of the methods 
described in [4J. 


In order to analyze the function (1), we must define an important property. Let m denote the 
n- vector m u . ■ ., m„, and let us consider a subset S of indices among all the possible m. This subset 
S is said to be consistent with respect to the failure law of the component and the structure of the 
system if the following property holds: if for some and for some m, m' tS, t «£ T,R (m, t ) > > R (m' , t, 
then R(m, t' ) ^ R{m , t') for all t' ^ T. Consistency for a set S implies that the reliability orderings 
among all the vectors belonging to S remain constant between zero and T. 

By taking T sufficiently small and by taking an upperbound N on the number of components at each 
stage i, it is always possible to find consistent reliabilities. Indeed, if we have n stages, we have at 
most N" reliability functions. As this number is finite, it is always possible to find consistent reliabili- 
ties for [O, 7'], where 7' is smaller than the smallest positive intersection point of these N" functions. 

It seems impossible to provide general necessary and sufficient conditions to insure consistency 
on the whole set of reliabilities. However, if the reliability function at a given stage Ri(m, t) can be 
written as Pi{m)gi(t), then the reliabilities are consistent. 

On the other hand, it is easy to find nonconsistent reliability functions. Numerical tests show that 
loaded standbys with exponential failure laws do not show consistent reliabilities for general values 
of T. As another case, consider a two stage system with components in a loaded mode and having a 
linear failure law; i.e., the reliability of the system is: R= [1— (Xi7') m i] [1 — (kit)™ 2 ]. We shall show 
that for \i 3= A 2 , we can find two reliability curves which intersect each other. It is sufficient, for ex- 
ample, to take m= (wi], m 2 ) = (4, l)m' = (m[, m' 2 ) = (2, 2). T must be less than 1/Ai. For t > X2/A1, 
/?(m, l) is larger than R(m', t), but for t < A 2 /M, it is smaller. Therefore, for A.2/M < T< 1/A.i, the 
two reliability curves intersect each other and are inconsistent. For T smaller than A^/A,, they do not 
intersect and are consistent. For larger number of stages, it becomes very difficult (computationally) 
to see if the reliability curves intersect each other. 

Therefore, the problem exposed in this paper is more difficult than the one solved in [4]. For ex- 
ample, generally there is no sense in creating an undominated sequence of allocations at a given time 
as in [5]. However, if we compute an undominated sequence at a given time, and if the terms (i.e., 
the vectors m of this optimal sequence) are consistent for all t between and T for all possible se- 
quences, then we have the following theorem: 


THEOREM I: If the set of vectors m of an undominated sequence at a given time, t, is consistent 
(i.e., if the undominated sequence remains the same for all t) then the optimal solution to (1) corresponds 
to a term of this undominated sequence. 

PROOF: Suppose the contrary and that the optimal solution is given for a vector m not belonging 
to the optimal sequence and with a cost of C. Consider the two successive terms in the optimal sequence 
of cost C and C" such that C < C «£ C". Then by definition of the optimal sequence and consistency, 
we have that the reliability R(t) of our solution is always smaller than the corresponding reliability 

R'(t) of the term costing C in the optimal sequence. Therefore, P J R(t)dt — C is smaller than 

P I R' (t)dt — C and we arrive at a contradiction. 

If the optimal sequence varies from one point in time to another, it is not true that the optimal 
solution to (1) always is a term in any optimal sequence. It can be a term which never appears in any 
optimal sequence at any time. Therefore, the use of optimal sequence is interesting only when they are 
identical at any time between zero and T. By extension, we shall call such an optimal sequence con- 
sistent.* As shown by our former example, it is very difficult to establish the consistency for two 
reliability curves even in the case of a very simple failure law. A fortiori, it is very difficult if not im- 
possible to establish such a property for a set, such as an optimal sequence, which cannot be formally 
defined mathematically. Therefore, the only way of checking the consistency of an undominated se- 
quence is to compute optimal sequences for different times and see if they are identical. If they are for 
a reasonable number of trials, it can be assumed that the undominated sequence is consistent. Naturally 
consistency on the optimal sequence is a far weaker restriction than consistency on the whole set of 


If the standbys are in a loaded mode and if the unreliability of a component at stage i is Pi(t), 
formula (1) becomes 

P ( T f[ {l-[pi(t)] m i}dt-^miCi = P [ T R(m,t)dt-C 


by taking C= V mjC;. (see [4].) 

For such a situation, we have the following properties. Proofs from previous chapters can be used to 
verify these properties. 

PROPERTY I: If the standby units at each stage are in a loaded mode, if Pi(t) ^ Pj(t) for all t 
between and T and if c t < cj, then at the optimal solution, the optimal number of components at 
stage i, m* is larger than or equal to the optimal number of components at stage /, m*. 

PROOF: See the proof of Theorem I in [3]. 

This property reduces the number of solutions we must consider when using a branch and bound 

*This property implies more than consistency on the set of indices of the optimal sequence and less than consistency for 
all the possible indices. 

398 C. G. HENIN 

r r 
PROPERTY II: If P increases, I Rdt and C are nondecreasing. 

PROOF: As the proof of Theorem VI in [4]. 

COROLLARY: If the undominated sequence at any time is consistent, then R and C are non- 
decreasing functions of P. 

This property and its corollary yield a lower bound on the cost which can be used if a solution to 
the problem is known for a certain value of P smaller than the present one. 

Consider variations in the duration of the mission T. Assume that R(t) and C are the reliability 
function and cost of the optimal solution for a duration T, and R' (t) and C the reliability function 
and cost of the optimal solution for a duration T'. For simplicity, notational purpose take 

Z(T) = ( T R(t)dt and Z(T')= T R' (t)dt. 

THEOREM II: If V > T, then Z{T) -Z(T) ^Z'(T') -Z'{T). 
PROOF: By hypothesis, the following relationships hold: 



These two inequalities imply that 

Z(T)-Z'(T) ^Z(T')-Z'(T') otZ'(T')-Z'(T) ^Z(T')-Z(T). 

COROLLARY: If the undominated sequence (at a given time) is consistent, then T' > T implies 
that R'^R and C 3* C. 

This theorem and its corollary show that as T increases, the corresponding optimal solution is 
either a more costly and more reliable one in the case of consistency of the undominated sequence or 
a solution such that / Rdt becomes greater between T and the new horizon if the undominated se- 
quence is not consistent. 

THEOREM III: If the undominated sequence at a given time is consistent and if P' 2= P, then 
at the optimal solution, m'* 3* m*. 

PROOF: Suppose that there are m* components at stage i with P and {m* — v) = m'.* with P' 
(v integer >0). Let R' and /?'' be the reliability on all stages, i excepted (i.e., RIRt and R' IR\) in the 
optimal solutions corresponding to P and P' , respectively. Similarly let C { and C be the cost of these 
(n — 1) stages in the corresponding solutions. 

PROOF: By the corollary of property I, R' (t) 5* R(t) at the corresponding optimal solutions. 

Thus, R'' > R\ Moreover, by hypothesis, 


P rRiR'dt-O-mfct&P V R'iR* - O - (m* - V)a or P [* (/?, - R'JWdt 5* vet. 
Jo Jo Jo 




P' r RlR'Ut - C' - (m* - v) Ci ^ P' [ T RiR'idt - C' 1 - m* Ci or vc> s= P' T (/?, - R'JR'Ut. 
Grouping (i) and (ii), we get 

P f 7 {Ri-R'iWdt^P' ( T (Rj-RDR'tdt, 

J () Jo 

and,asP< P' and R t {t) 5* /?•(*) , we get R j >/?''. Hence, a contradiction. 

This theorem allows us to determine a minimum (maximum) number of components at each stage, 
if a solution is known for a smaller (larger) value of P and if the undominated sequences are consistent. 

Generally, the reliabilities R,(mi) are piecewise concave in m,. To assume such a property is not a 
drastic restriction. In the case of loaded standbys, it is always satisfied. In the case of unloaded stand- 
bys, this is not always true. However, if we consider for example, components with exponential failure 
laws with mean 1/A., the gain in reliability by passing from m units to (m+ 1) units is, for a one stage 
problem, (Kt) m e~ xt l(m\). This gain is a decreasing function of m for \t < m. Generally, problems 
will remain in such a situation because, if at the optimal solution m is less than \t, for a time t less 
than 7\ this implies that the reliability at the end is not very high. For m= 1, this implies a reliability 

smaller than e' 1 at time T* 

If Ri(mi) is piecewise concave for all i's, that our assumption holds, we have the following 

Suppose that, for each stage, a maximum reliability attainable has been computed, i.e., at most 
Zi(t) the limit of /?,(m,, t) when m,- goes to infinity; this limit always exists because /?, is increasing 
in mi and bounded by 1. Let R' XJ (M being used for representing the maximum) be the corresponding 
maximum reliability on all the stages, i excepted. 

PROPERTY III: If the reliability at each stage is a concave function of the number of components 
at this stage, there exists an upper bound, A/,, on the number of components at each stage i such that 

r^minfybP f 

Mi=mm\k:P (/?,(£+ 1, t) -Ri(k, t))R[ 1 dt ^ C, . 

PROOF: The proof is analogous to that of Theorem V in [4]. 
Similarly, if we have a minimum number of components at each stage computed, and R) is the 
minimum reliability on all the stages, i excepted, the following property holds: 

PROPERTY IV: If the reliability at each stage is a concave function of the number of compo- 
nents at this stage, there exists a lower bound on the number of components at this stage, L,, such that 

Ls = max 

lk:P [ (R i (k,t)-R i {k-l,t))R i L dt^c]. 

These properties are also valid in the discounted case. A study of the asymptotic behavior of the 
function (1) yields the following propositions. 

*We will show later that the gain on the integral of/?,: i.e., {Ri (m,, t ) — Rt (mj — ],t))dt,Ka decreasing function of m, and 
therefore, the following properties are also fully applicable to this situation. 

400 C. G. HENIN 

THEOREM IV: If P-» <», then at the optimal solution, m* tends to °° for all i. 

PROOF: mf tends to », because L< does. In order to have Li ^ N (N— arbitrary large), it is sufficient 
to take 

P^d J j T (Ri(N, t) -Rt(N-l, t)R[dt, 

where /?j (t) can be taken as R'(l, . . ., 1; t) to prove the theorem. 

When T tends to infinity, results are less sirriple. However, it is possible to give some results 
depending upon the integrability of the reliability functions. 

THEOREM V(a): If T = oo, and if there exists an index ;' , such that Z } {t) be integrable on [0, «>] , 
then at the optimal solution to (1), m* remains finite for all i. 

PROOF: We apply Lebesgue's theorem to the function 


/(m,) converges to / = Zj(t)/?^ and is bounded by Zj (t)dt € J? i. Then 7(/(toj) ) e J£ \ and converges 

Therefore, I(f(m.i)) — I(f(mi — 1)) converges to zero for m* going to infinity. From Property III, 
there exists a maximum number of components M, for stage i (»= 1, . . .., n). 

In the discounted case, it is easy to see that the above theorem is always applicable: it is sufficient 
to replace Ri by R' t — e~ rtln Rj to see that Z,(r) < e~ rtln is integrable on [0, »]. Therefore, in the dis- 
counted case, when the horizon is infinite, the optimal solution to the problem remains finite. 

THEOREM V(b): If none of the Z, belongs to JP U but if 


lim | (R j (m j ,t)-Rj(m j -l,t))L>dt<C j IP-€, 

' Jo 

for € positive number, then m* remains finite when T goes to innnity. 

PROOF: By contradiction, if we take m* 2* M, with M large enough for 


(Rj(M, t) -Rj(M-l,t) )U dt < CjlP 


we get M ^ Mj and thus a contradiction. 

For other situations, the asymptotic behavior of m* is unknown. The function R may admit at 
least two maxima, one at finite range and the other at infinity. A priori, it is impossible to determine 
which one of these maxima would be the global one. However, if the left hand side of (2) is > CjIP + e, 
making m,- going to infinity, increases the value of (1) to infinity. 


A situation which may frequently arise is the case of unloaded standbys. In this situation the 
standbys units have no probability of failure until they are put in the system. This may frequently 


happen when only one stage is used; i.e., a system consists of one component and spare parts which 
are introduced in the system when the main component fails. 

Let q(t) be the unreliability of one of these components, Q n (t) the unreliability of the system made 
of one basic unit and (n — 1) standbys and let A'„(t) be the first derivative with respect to t o(Q„(t). 
The unreliability of the system is given by the following expression [2]; 

Qn{t)= [' q{t-r)Q' n ^{T)dT forn>l 

Q i (t) = q(t). 

Such a law can be computed or approximated in most cases. For an exponential failure law, this formula 

(4) Qn(t) = l-^[(kt) k lkl]e- M , 

and (4) is easily integrable. The benefit of adding one unit to the system: i.e., the benefit of passing 
from n units to (n + 1) units is 

P I (\t)"lnle~ Kl dt — c (if c is the cost of a component of this one stage system). 

By solving the integral, this function becomes 

P[l-e- xr (l+ . . . + (\D"/n!]/ x -c. 

This function represents the profit of adding one unit to a one stage system with one basic unit and 
(n — 1) spare parts. It is decreasing in n and negative for n going to infinity. The solution n* to our prob- 
lem is therefore the smallest number n, such that the above expression is negative. 

In the case of other failure laws, the integrals are not that easy to solve. If the unreliability of the 
unit q{t) remains bounded between two linear functions of t : \t *£ q(t) «£ kt, the unreliability of the 
system is bounded between the unreliabilities of two components with exponential laws with parameters 
X and X. Therefore, the above formulas applied to X and X give approximate solutions to the problem. 

In the case of multi-stage problems, the above type of solution is not valid and the problem is more 
complex. However, for exponential failure laws, for example, all the terms in R, made of the products 
of expressions such as (4), can be integrated. If, for each stage, we compute a number nt, as above, 
this number is not the optimal solution, but an upper bound on the number of components at each stage 
because the profit of adding a component is not 


P I (R ( (mi,t)-Ri(mi-l,t))dt-Cu 


(5) P f T (Ri(m i ,t)-Ri(m i -l,t))R i dt-c t . 


402 < : - G - HENIN 

If a lower bound is known on the number of components at each stage, (5) can be used, by replacing 

/?' by/? 1 to generate new lower bounds on the number of components at each stage. 


The former theorems, enable us to parametrize the problem as soon as an optimal solution is found 
for a value of P and T. However, the main difficulty remains the determination of such an optimal 

solution. We propose the following algorithm (in the case of/?, or I Ridt concave in m*) :* 


(a) Compute the limit for m going to infinity of R, ; (m, t) (i= 1, . . .,n). 

(b) Compute a maximum number of components at each stage (Property III). Stop the process 
when no further improvement is possible. Similarly, compute also the minimum number of components 
at each stage. If the minimum and maximum number of components at each stage coincide for each 
stage, stop. This is the solution. Otherwise, go to the next step. 

(c) Select different times and compute the undominated sequence for each of these durations, 
the number of components at each stage remaining between the bounds determined at the former step. 
If these optimal sequences are identical, assume that the optimal sequence remains consistent (even 
if the system is not consistent, the solution will be very near the optimal one) and go to step d. If the 
optimal sequences remain identical, except for costs between a and b, assume that the optimal sequence 
is consistent for costs smaller than a and larger than b and go to step/. If the optimal sequences do not 
satisfy one of the two categories above, go to step e. ** 

(d) Compute the value of (1) for all the terms of the optimal sequence and terminate by taking the 
terms giving the largest value of (1). This is the optimal solution (or a solution very close to it). 

(e) In this case, apply the branch and bound technique described in [3] and [4]. There are no funda- 
mental, theoretical difficulties for its applications to the present problem; however, at each step, the 
integral of the reliability of the current solution considered in the algorithm must be computed, which 
lengthens the computations. The initial solution and lower bound on the function is either (0, . . ., 0) 
or (Mi, . . ., M„), whichever one gives the largest value of (1). 

(f) Compute the value of (1) for all the terms of the optimal sequence for costs smaller than a or 
larger than b. Take the term H among them which gives the largest value of (1). Then apply the branch 
and bound algorithm, as in step (3), with the following restrictions: all the complete solutions must 
have a cost between a and b and the initial solution and lower bound are H and the corresponding 
value of (1). The solution given by this algorithm is the optimal (or very close to the optimal) solution 
of the problem. 

NOTE: It does not seem possible to use an approximate solution as we did in [4], because the can- 
cellation of the first derivatives of the objective function does not give a solvable system of equations 
as in [1]. In some cases, however, (1) is also a concave function of the rrij. From [4], this is certainly 
true in the region of m;'s such that 


i = l 

*As mentioned before, in the discounted case, Ri is replaced by e r,ln Ru 

** In the case where the undominated sequence would remain consistent except on some intervals [aj, . . ., bj\, it would 
be possible to extend the method described in/. 


or everywhere if we have loaded standbys. If (M%, . . ., M„) satisfy such a condition, it seems possible 
to find a local maximum (which would probably be a global one too) by applying the same routine de- 
scribed in [4]. Now, if this solution did not satisfy (4), we would have to show that the function (1) is 
still concave at that point or forget it and apply the algorithm described above. 


[1] Fan, L. T., C. S. Wang, F. A. Tillman, and Huang, "Optimization of System Reliability by the 
Discrete Maximum Principle," IEEE Transactions on Reliability, lb (Sept. 1967). 

[2] Gnedenko, B. V., Y. K. Belyayev, and A. D. Solovyev, Mathematical Methods of Reliability Theory 
(Academic Press, 1969). 

[3] Henin, C. G., "An Algorithm for Maximizing Reliability Through System Redundancy," Carnegie- 
Mellon University, Management Sciences Report #216 (Aug. 1970). 

[4] Henin, C. G., "Computational Techniques for Optimizing Systems with Standby Redundancy," 
Nav. Res. Log. Quart. 19, 293-308 (June 1972). 

[5] Ketelle, J. D., "Least Cost Allocation of Reliability Investment," Operations Research, 10, 249-265 


Eric Langford* 

University of Maine 
Orono, Maine 


This paper analyzes, from a game-theoretic standpoint, the simultaneous choice of 
speeds by a transitor and by an SSK which patrols back and forth perpendicular to I Ik 
transitor's course. Using idealized acoustic assumptions and a cookie-cutter detection model 
which ignores counterdetection, we are able to present the problem as a continuous game, 
and to determine an analytic solution. The results indicate that with these assumptions 
there are conditions under which neither a "go fast" nor a "go slow" strategy is optimal. 
The game provides a good example of a continuous game with a nontrivial solution which can 
be solved effectively. 


This paper analyzes from a game-theoretic standpoint the simultaneous choice of speeds by a 
transitor and by an SSK which patrols back and forth perpendicular to the transitor's ( nurse. (An SSK 
is a submarine whose mission is directed against enemy submarines.) The payoff, in effect , is taken to be 
the SSK's detection sweep width. This game was originally investigated by D. H. Wagner and E. P. 
Loane in classified reports during 1963-64. Their treatment was confined to choices of .speeds from a 
discrete set, but applied to rather general acoustic conditions. The present analysis assumes an ideal- 
ized form of propagation loss versus range and of noise versus speed. These idealizations permit the 
sweep width to be expressed as a convenient continuous function of the two speeds; accordingly, they 
allow each player to make choices from a continuum of speeds, rather than from a discrete set. They 
also permit a comprehensive analysis of the variety of cases which can arise within the idealizations. 

Subsequent to Wagner and Loane's work, an approach to this game was undertaken by Mathe- 
matica [6], treating a continuous analytic payoff function based on idealized acoustics. Unfortunately, 
inconsistent acoustic assumptions were used in [6]; these were corrected by Mathematica in a subse- 
quent report [1\ Motivated by [6] (and prior to [1]), the present analysis was undertaken, again using a 
continuous payoff function, but using acoustic assumptions generally felt to be consistent. 

In this paper, we assume that the graph of noise versus speed is linear above a breakpoint speed 
below which noise is independent of speed; this is a common assumption and was made in [6]. We 
also make the usual "spreading law" assumption, i.e., that propagation loss is proportional to k'th 
power of range when loss is expressed in power units. This of course is equivalent to the assumption 
that propagation loss is proportional to the logarithm of the (k'th power of) range when loss is ex- 
pressed in decibels. (The inconsistency in [6] appears to be on this point.) We take for a payoff function 
what is essentially the SSK's kinematicaUy enhanced sweep width; the SSK. of ionise, attempts to 

*Research on this paper was performed when the author was with the Naval Postgraduate School and Daniel H. Wagner, 



maximize this quantity and the transitor attempts to minimize it. The SSK will thus be maximizing 
his detection probability if we assume a cookie-cutter detection model, i.e., that detection occurs when 
and only when the signal excess (assumed deterministic) reaches a threshold value. We ignore counter- 
detection by the transitor. This will be a realistic assumption if the SS/Ts acoustic capability is far 
superior to that of the transitor. The more general problem, which takes into account counterdetec- 
tion and which allows for a stochastic detection model, appears to be quite difficult to solve within this 
framework. However, in a discretized form, it was handled satisfactorily by Wagner and Loane in the 
above-mentioned reports. 

The game is described in abstract terms in the first section, i.e., the payoff function is given in a 
form which is equivalent (for purposes of the game's analysis) to the formula for theSSK's kinematically 
enhanced sweep width. (The SSK's effective sweep width is increased by his back and forth patrol.) The 
properties of the payoff function are developed into geometric criteria for solution. In the second section, 
graphical methods of finding the solution are given and illustrated by examples. A numerical solution is 
described in the third section. The fourth section enumerates the possible outcomes as combinations of 
the pure and mixed strategies. Identification of the abstract game with the real SSK versus transitor 
game is given in the final section. 

In an earlier version, this paper was submitted as a Memorandum to Commander, Submarine 
Development Group Two, in New London [4]. In this earlier version are included graphs of all possible 
cases and a Fortran computer program to solve the game. A subsequent classified memorandum applied 
the analysis to some "real life" numbers. 


We consider the following game. The maximizing player (SSK) chooses a speed u in the range 
*£ "min < « ^ "max* and the minimizing player (transitor) chooses a speed v in the range 
< v min ^ v *£ t> max - The payoff function is 

F(u,v) = e cv - U V l + u 2 / v 2 , 

where c > is a constant. 

The explicit identification of this payoff function with the SSK's detection sweep width will be 
clarified later. For the time being, we treat the game abstractly. 

Computing the partial derivative with respect to v, we see that 

F 2 {u, v) = F(u, v)\c " . 

L v{u 2 + v 2 )\ 

The second partial derivative with respect to v is 

p ( \ ~i \\\ u 2 I 2 | u 2 (u 2 + Sv*) } 

F 22 (u, v) = , (a, v) |[c - v ( u% + yi) J + ^ + ,,2)']. 

so that F22 >0. That is, F is convex in its second argument; it is well-known that this implies the 
following facts (see [2, p. 80] or [5, p. 259]): 

1. The minimizing player (transistor) always has an optimal pure strategy. 


2. The maximizing player (SSK) always has an optimal mixed strategy involving at most two speed 
choices; i.e., he either has an optimal pure strategy, or an optimal mix of two speeds. 

Let us fix v and consider F as a function of the one variable u. Since F is continuous and restricted 
to a closed bounded interval, it assumes a maximum value; this maximum must be taken on at an 
endpoint (i.e., either u min or u max ) or at an interior relative maximum, which can, in this case, be 
located by differentiation. To this end, we form the partial derivative of F with respect to u, and equate 
it to zero: 

Fi(u, v) = F(u, v) 

a 2 + v 2 J 


Now F never vanishes, so that Fi (u, v) — iff u = u 2 + v 2 . The solution set to this equation in the u — v 
plane is the semicircle 

{(u,v): (u - 1/2) 2 + v 2 = 1/4, and v > 0}. 

From the form of Fi, it is evident that F\{u, v) > if (u, v) is inside the semicircle, and that 
F\(u, v) < if (u, v) is outside the semicircle. Thus the graph of F(u, v) versus u for fixed v can have 
three qualitatively different forms as shown in Figure 1. The case v= 0.6 is typical of v > 1/2 and the 
case v — 0.4 is typical of v < 1/2. 

1.8 - 

1.7 - 

"3LI6 -* 

1.4 - 

v=0 6 


v=0 5 


v =0.4 


Figure 1. Qualitatively different forms off. 


Let us define the function <p as follows: 

ip(v) = max {F(u, v): u min ^"^« max }- 

From the above observations, it is not difficult to infer the following: 

(1) If v 2* 1/2, then <p(v) = F(u min , v). 

(2) If v < 1/2, then 

(1) (p(v) = max {F(u min , v), F(u max , v), F(u , v)} , 

where u = V2 + VV4 — v 2 , if this falls within the interval u min ^ u ^ u max ; otherwise we ignore u . 

Since F is convex in its second argument, v, it follows that <p is continuous. Its minimum value will be 
the value of the game; moreover, if <p assumes its minimum at v— v 0< then ^0 is an optimal pure strategy 
for the minimizing player. By the convexity of F, ip is unimodal (hence the above vo is defined uniquely). 
Since <pis unimodal, we can locate its minimum numerically with great efficiency using a binary search; 
graphical methods are also possible. 

Similarly, we can fix u and solve the equation Ft{u, v) =0: 

F 2 {u,v)=F(u,v)[c- 1 ^^ 

= 0. 

v 2 ). 
The solution set to this equation is 

{(u,v):cv i + cu>v=u 2 , (u,v) # (0,0)}. 
By the Cardan-Tartaglia formula, this defines vasa function of u as follows: 

v/2c V4c 2 ^27^W2c V4c 2 ^27 

Conversely, if v < 1/c, we can solve for u as a function of v : 

I CV* 

Figure 2 shows the graphs of F l (u, v) = and F 2 (", v) = for several values of c. 

The intersection of the graphs of F\ (u, v) = and F 2 (u, v) = will give a saddle point if u > 1/2, 
i.e., if c > 1. If there is a pure strategy minimax solution to the game which is interior for both players 
(i.e., neither u mln nor u max for the SSK and neither v min nor v max for the transitor), then the solution 
must occur at this saddle point. (Edge minimaxes need not be of this form.) To find this saddle point, 
we solve the following equations simultaneously: 

F,(u, v) = F 2 (u, v) = 0. 



c = 05 


c = I.O 



>F 2 (u,v)=0 

1 2 3 0.4 0.5 0.6 0.7 8 09 10 SSK's SPEED u 

Figure 2. Graphs of F t (u, v)=0 and of F 2 (u, v)=0 for several values of c. 

Strangely enough, the answer is exceedingly simple: 

u = 

c 2 +l 

v = 

c 2 +l 

Note that there can be no interior minimax if 



Define the function/as follows: For any uo, u min n "o =£ u max , we define vo=/(«o) iff F(u , vo) = min 
{F(u , v): v min =£ v =s fmax}- That is, for any admissible «,/(«) is the v which minimizes F. By the 
v-convexity of F, f is a well-defined, single-valued function. Moreover, / is continuous and monotone 
increasing. In fact, if we stay away from v min and tt max >/is strictly increasing; more precisely, if U\ < u% 
are such that v min < /(«i) =£ f{uz) < v max , then f is strictly increasing on the interval [u\, u 2 ]. See 
Figures 5, 6, and 7 for examples. 

Actually, we can give a simple formula for/. Let h(u) be the solution to F 2 (u, h(u)) = 0, i.e. 

\2c V 4c"- 27 ^ \ 2c V4c 2 + 27- 



For unrestricted v, F(u , v) is convex and has an absolute minimum at v=h(u»). Therefore if u min 
*£ h{u ) =£ f max , then /(u ) = h(u ). If h(u ) 2* f max , then /(u ) = f max , and if h(u ) *£ v min , then 
f(uo) = fmin- We can summarize these cases as follows: 

/(u„) = min {v max , max [v min , h(u„)]}. 

It is straightforward to determine /(«o) graphically. Let «o be fixed and consider the line segment 

S={(u, v): u = u and v min =£ v ^ v max }- 

The three possibilities, namely /(uo) = v min ,f(uo) = f max , and/(u ) = h{u ) can be exhibited graphi- 
cally as follows: 

(1) If the line S lies completely above the graph of h{u) versus u, then/(u ) = t>mnv 

(2) If the line S lies completely below the graph of h(u) versus u, then/(« ( >) = i> max . 

(3) If the line S intersects the graph of h(u) versus u at («o, Vo), then/(u ) = vo = h(u ). 

These three possibilities are graphed in Figure 3. 


1 0.6 



OT 0.5 






m 04 








/ i i i i i i 1 1 _ 

i I 

0.1 0.2 0.3 04 05 0.6 07 0.8 09 10 I.I 


FlGURE 3. Three possibilities in the determination of f(u); (c = 1 in this example). 

In a similar fashion we define the "function" g as follows: For any vo, v mit 


^max? w e define 

Uo — g(vo) iff F(u , vo) — max {F(u, vo): "min ^ u ^ "max) = <p(vo)- From Figure 1, it is clear that this 
need not, in general, define a single-valued function. We shall subsequently show that either g is a 
continuous single-valued function or that the graph of g consists of two continuous pieces which over- 
lap only at an endpoint. More precisely, in this second circumstance, there exists a point t>n such that 
Vmin ** vo *£ v max , and such that g restricted to either of the sub-intervals [v min , v ) or (vo, fmax] is a 
continuous single-valued function; however at v = vo, g(v) is bivalent; it has the two values g(vo — 0) 
and g{vo + 0). (It will also be shown later that this right-hand limit can only be u m in a °d that the left- 


hand limit can be either u max or u = 1/2 + Vl/4 — v%. ) In either case, g is monotone decreasing: 
g(v\) ^ g(v 2 ) whenever v\ ^ v 2 . Figure 5 illustrates the first possibility, while Figures 6 and 7 illustrate 
the second. 

Note that when we fix v and regard F(u, v) as a function only of u, the constant c enters the ex- 
pression for F(u, v) only as a constant multiplier. Since we are interested in locating the maximizing 
u for fixed v , it follows that the value of c is unimportant in the discussion which follows. That is, the 
maximizing u is located in the same place no matter what value c has. 

SSK's SPEED u-»- 

2 3 4 5 6 0.7 0.8 0.9 1.0 

Figure 4. Six possibilities in the determination of g(v). 

We refer to Figure 4: the semicircle ABC is the locus of Fi(u, v) = 0. The points on the quarter- 
circle AB are relative minima for F(u, v) considered as a function of u for fixed v, and the points on 
the quarter-circle BC are relative maxima. The point B itself is an inflection point (cf. Figure 1). The 
curve DB is defined as follows: if (iti, vo) lies on DB, then F(ui, vo)—F(u 2 , vo), where (u 2 , v ) lies 
on the quarter-circle BC. The curve DB is thus obtained as the solution set of the following transcen- 
dental equation 

F(u, v) = F(l/2 + Vl/4 - »*, v), 
where < u «£ 1/2. 

Let Vo be fixed and let S' denote the line segment 

{(u, t>):u min s£ u=s£u max and v = v }. 

There are six possibilities: 

(1) The line S' lies completely within ABC; then g(vo) = "max- The line S' may meet, but not cross 

(2) The fine S' lies completely outside of ABC; then g(vo) = u m in- The line S' may meet, but not 
cross ABC. 


(3) The line S' crosses AB, but does not meet or cross DB. In this case, g(vo) may be « m j n , "max, 
or both, depending on the relative sizes of F(u min , vo) and F(tt max , vo)- 

(4) The hne S' crosses DB; then g(vo) = u min . 

(5) The line S' meets or crosses BC, but does not meet DB. In this case, the maximizing Uo occurs 
at the intersection of S' and BC. Evidently g(vo) = «o~ Va + VV4— vjj. 

(6) The line S' meets, but does not cross OB. In this case, g(vo) = u min , unless S' also meets or 
crosses BC. In this latter circumstance, g(vo) has the two values « m in and uo, where Uo is at the inter- 
section of S' and BC as in the case above. 

These possible cases are all graphed in Figure 4. 

If we graph g(v ) versus v, the graph will be a nice continuous curve as long as g(v) "stays put", 
i.e. as long as g(v ) is one or the other of the endpoints or is uo. Problems arise at the two "transition 

A. As in Case 3, when F(u min , vo) = ^("max, vo) = <p(vo). This will be called a type A transition. 

B. As in Case 6, when F(u min , vo) — F(uo, vo) = (f(vo). This will be called a type B transition. 

As v increases through a transition stage Vo, g(v) will make an abrupt jump. Precisely at the transition 
stage v — to, g(v) will be two-valued. It will now be shown that as v increases through the transition 
stage Vo, one of the following two things will happen: 

A. If Vo is a type A transition, then#(^o) must jump from « max to u min . 

B. If vo is a type/? transition, theng(vo) must jump from Uo tou min . 

It is a consequence of the above that g can have at most one transition stage, since any transition 
puts g(v) in the "absorbing state" "mm- An example of a type A transition is found in Figure 7, and 
an example of a type B transition is found in Figure 6. 

To prove the above assertion, let us suppose that «i < «2 and that for some Vo,F{u x , vo) —F{u-i, vo)- 
If we define the function G(v) = F(u\, v) — F(« 2 , v) , then by assumption, G(vo) = 0. By computing 
the (continuous) derivative C (v) , it is easy to show that G' (vo) > 0, so that G is increasing in a neighbor- 
hood of i>o; in other words, as v increases through fo, F(u\, v) is first less thanf(u2, v ) and then greater, 
so that a transition can occur from a larger value of u to a smaller value, never the other way. 

The "function" g is monotone decreasing since it decreases at its transition point (if there is one) 
and since it obviously must decrease at points other than transition points. As is the case with /, if 
V\ < t>2 are such that u min < ^C*^) ^ g^i) < "max, then g is strictly decreasing on the interval [vi,v 2 ]. 

If we plot both /and g within the rectangle of admissible speeds defined by 

{(a, v) : u min =S u *£ «, Ililx and v min ^v^ t> max } , 

then one of two things will occur: 

(1) The two graphs will intersect at a single point («o, vo). In this case, «o is an optimal pure 
strategy for the maximizing player (SSK) and t>o is an optimal pure strategy for the minimizing player 
(transitor). See Figure 5 for an example of this case. 

(2) The two graphs will not intersect. More precisely, there will exist a transition point fo such 
that g(v ) has the two values «i and "2, where Ui < u> and where «i <f~ l (vo) < u 2 . See Figures 6 
and 7 for examples of this. Note that/" 1 is defined at Vo since/is strictly increasing as it passes through 





c = 2.0 


0.5 - 


0.2 0.3 04 05 0.6 07 0.8 0.9 10 

Figure 5. Graphical solution of the game — Example 1. 


0.1500 A FRACTION 3972 
0.6981 A FRACTION 0.6028 
i i i i 1 1 


SSK's SPEED u-»- 

2 0.3 4 0.5 6 07 8 9 

Figure 6. Graphical solution of the game— Example 2. 


the hole in the graph of g. In this case, the minimizing player again has the optimal pure strategy Vo, 
but the maximizing player now has an optimal mixed strategy given by u, with probability p and u 2 
with probability 1 — p. The constant p is found by solving the following equation: 

P'Fa(ui, vo) + (1 ~p) • F 2 (u>, t> ) = 0. 






0000 A FRACTION 1350 
2000 A FRACTION 8650 


SSK's SPEED u — 

2 0.3 4 5 6 7 8 0.9 

Figure 7. Graphical solution of the game — Example 3. 

No other possibilities (e.g., two intersections) can arise by the aforementioned monotonicity properties 

Three numerical examples of this graphical solution are given to make this more clear. In each 
of these figures, the graph of/ is indicated by dash line and the graph of g is indicated by a dot-dash 
line. The box is the rectangle of admissible speeds defined above. 


The following is a step-by-step procedure for solving the game. It essentially repeats the graph- 
ical procedure, but from a point of view which emphasizes suitability for computation. This procedure 
has been programmed for the GE-235 computer: a copy of the program and sample output are included 
in [4]. The following step-by-step procedure can be considered a macroscopic flow chart of the com- 
puter program. 

(1) Locate the minimum of <p(v ) = max,, F(u, v). Since <p is unimodal, this can be done easily by 
iterative computation using a binary search. The function (p(v) itself is evaluated by using formula 
(1) derived earlier. The minimum value of <p(v) is the value of the game. 

(2) Let vo be that unique number such that <f>(vo) = min t , <p(v). This is obtained automatically as 
a by-product of step 1. Then v is the optimal pure strategy for the minimizing player (transitor). 

(3) Solve the equation F(u, vo) = <p{vo) for u by checking the three possible places where the max- 
imum could occur (viz. u mln , u max , and uo= V2+ VV4 — t^). If there is a unique solution «*, then u* is 
an optimal pure strategy for the maximizing player (SSK). If there are two distinct solutions U\ < U2, 
then a mix of Ui with probability pand u 2 with probability 1 — p will be optimal. The number p is the 
solution to the equation 

p-F 2 (u u vo) + (1-p) •F 2 (u 2 ,vo)=0. 



From an examination of Figure 1 or otherwise, we see that there are five qualitatively distinct 
possibilities for the maximizing player: 
U(l). A pure strategy of u min . 
U(2). A pure strategy of u max . 

U(S). A pure strategy of Uo, where u mln < u < u max . 
C/(4). A mixed strategy of u min and « max . 
U(5). A mixed strategy of u min and Uo, where u min < Uo < u max . 

Since the minimizing player always has a pure strategy, there are only three possibilities for him: 
V(l). A pure strategy of v min . 
V(2). A pure strategy of v mi) 
V(S). A pure strategy of vo, where v min < Vo < v max . 

Apparently there are 15 cases to consider. However it is known [5, p. 267] that in cases V{\) 
and V(2), the maximizing player must also have a pure strategy. (This can be inferred also from the 
geometric reasoning previously used.) Thus there are only 11 possible cases. In [4], examples are 
given of each of these 11 cases. We remark that for certain values of c, some cases are forbidden. For 
example, if c ^ 1, then the case U(3) — V(S) cannot occur. 


We shall now identify the foregoing abstract game with a real SSK versus transitor game. 

The SSK moves back and forth at constant speed u across a rectangular barrier zone. The transitor 
enters the zone on a course perpendicular to the SSK's course at a constant speed v' . 

The payoff for the SSK is detection sweep width. He attempts to maximize this quantity and the 
transistor attempts to minimize it. Let L s {v') denote the radiated noise (in decibels measured at 1 yard 
from the source) of the transitor as a function of its speed v' , and let L N (u') denote the self noise (in 
decibels) of the SSK as a function of its speed u' . 

We assume that the noise curves of both SSK and transitor are of the form shown in Figures 8 
and 9, respectively. 

L N (u') 

u'min SSK's SPEED u' 

Figure 8. Noise curve for SSK. 




I min -f 

U S" _ 




Figure 9. Noise curve for transitor. 

It is obvious that the SSK will never travel more slowly than u' min and that the transitor will never 
travel more slowly than v' min . We therefore assume that L N (u') and £<?(*/) are both linear for all u' 
and v , but in the analysis we will not consider any speeds less than these minimum speeds. The 


maximum speeds, u' max ana " "inaxt ar e of course determined by the physical characteristics of the 
respective submarines. We have then the following formulas for L.v(u') and L s (v'): 

L N (u') = L™ i »+b(u'-u' min ) 
L s (v')=Lf"+a(v'-v' min ), 

where a and b are the slopes, measured in decibels/knot. 

Assume that propagation loss obeys a spreading law, so that the decibel loss in propagating 
from 1 to R yd is k logio R, for some fixed k > 0. 

The unenhanced sweep width W for the SSK is given by 

k\ogn>(JPI2)=L s (v')-L N (u')+N DI -NRp, 

where N D i and Nrd are the SSATs directivity index and recognition differential. (Of course all terrr 
in this equation are taken at the frequency and bandwidth of interest.) 
If we solve this equation for W, we obtain 

W = 2 exp {I (log 10) [L s W ) - L N (a' )+N Dl - N RD ] } 

= 2 exp || (log 10) [Lf n + av' - av' min - Lf n - bu' + bu' min + Ndi -N RD ]\ 

= 2exp{!(loglO)[L- i "-a U ' mln -L^^ 
If we make the following substitutions: 

u=(| logloW 

K=2 exp{| (\oglO)[LF"-L% in -av' min + bu' mn + N DI -N RI >]} 

c= v 

then W is of the form 

W=Ke cv ~ u , 

where K and c are positive constants. 


As noted by Wagner and Loane and by Mathematica, the kinematic enhancement of sweep width 
in the back and forth patrol is a multiplicative increase in the approximate amount Vl+ (u'/v') 2 , 
providing the SSK's patrol legs are substantially longer than the acoustic sweep width W . (See [3, 
Equation (7.1.4.)].) Since the kinematic enhancement factor depends only on the ratio u'/v' = u/v, 
the results of the previous analysis apply. Here the primed variables, namely «' and v', refer to true 
ship speeds (in knots). The unprimed variables, namely u and v, refer to the "normalized speeds" as 
considered in the solution to the game. All graphs, etc. are referred to these normalized speeds. 


I would like to thank Dr. Daniel H. Wagner for his help in the preparation of this paper. The work 
was originally supported by ONR Contract Nonr-4784(00). The writing of this paper was supported 
by an ONR Foundation Grant (FY1968). 


[1] Agin, Norman I., et aL, "The Application of Game Theory to ASW Detection Problems," Mathe- 
matica Report, Princeton, New Jersey (Sept. 30, 1967). 

[2] Karlin, S., Mathematical Methods and Theory in Games, Programming, and Economics (Addison- 
Wesley, Reading, Mass., 1959). 

[3] Koopman, B. O., "Search and Screening," OEG Report No. 56, Operations Evaluation Group, 
Office of the Chief of Naval Operations, Navy Department (1946). 

[4] Langford, E. S., "Game — Theoretic Analysis of Choice of Speeds by SSK and Transitor," Daniel 
H. Wagner Associates Memorandum to CSDG-2 (Nov. 17, 1966). 

[5] McKinsey , J. , Introduction to the Theory of Games (McGraw-Hill, New York, 1952). 

[6] "A Study of Optimal Patrol and Transit Strategies in a Rectangular Barrier Zone Using Mathe- 
matical Games," Mathematica Report, Princeton, New Jersey. 


Lawrence D. Stone 

Daniel H. Wagner, Associates 

Paoli, Pennsylvania 


This paper considers the problem of finding optimal solutions to a class of separable 
constrained extremal problems involving nonlinear functionals. The results are proved for 
rather general situations, but they may be easily stated for the case of search for a stationary 
object whose a priori location distribution is given by a density function on R, a subset of 
Euclidean rc-space. The functional to be optimized in this case is the probability of detection 
and the constraint is on the amount of effort to be used. 

Suppose that a search of the above type is conducted in such a manner as to produce the 
maximum increase in probability of detection for each increment of effort added to the 
search. Then under very weak assumptions, it is proven that this search will produce an opti- 
mal allocation of the total effort involved. Under some additional assumptions, it is shown 
that any amount of search effort may be allocated in an optimal fashion. 


In this paper we consider the relationship between incrementally optimal allocations and totally 
optimal allocations. Motivation for studying this relationship arises naturally in planning a search for 
a lost object. Suppose that the search planner is given authorization to search for a fixed time interval, 
and he conducts the search to produce the maximum probability of detection at the end of the interval. 
If the search fails to detect the lost object within the allotted time, the planner may be given authoriza- 
tion to continue searching for an additional time increment. In this case the planner may allocate the 
additional search effort to maximize the probability of detection in the given increment. Having done 
this, one may ask whether the search could have produced a higher detection probability if it were 
known in advance that both the initial time interval and the added increment were available. 

In mathematical terms the search problem is to allocate optimally a given amount of effort in order 
to detect a stationary object, the target, located in Euclidean ra-space, R. There is a function / which 
gives the probability density of the target's location. Suppose T is the amount of effort available for the 
search. Then the search planner seeks a function q* :R—* [0, ») such that I q*(x)dx *£ T and 


(1.1) (b(x,q*(x))f(x)dx=maxU i b(x,q(x))f(x)dx:q^O,j i q(x)dx^Ty 

The function b(x, •) is the local effectiveness function at x. That is, b(x, y) gives the conditional prob- 
ability of detecting the target given it is located at x and the effort density is y at x. The integral on the 
left of (1.1) gives the probability of detecting the target when using allocation q*. The function q* is 
called an optimal allocation. This problem has an obvious analog when R is replaced by a countable set 
of locations or boxes. 

This research was supported by the Naval Analysis Programs, Office of Naval Research, under Contract No. N00014- 


420 L- D. STONE 

For the case where b(x, y) — \—e v for xeR and y^O, Koopman [4, p. 617] made the following 
observation. Suppose one allocates T\ amount of effort in an optimal fashion, but fails to detect the 
target. An increment T t of effort then becomes available. If one allocates this additional effort in an 
incrementally optimal manner (i.e., optimal considering the previous allocation of T\ amount of effort), 
then one obtains an optimal allocation of T\ + T 2 effort. That is, two incrementally optimal allocations 
produce a totally optimal allocation. 

In [2] an incomplete attempt was made to show that incrementally optimal allocations produce 
totally optimal allocations provided that db(x, y)/dy is a positive monotonic nonincreasing function 
of y for xeR. In section 2 of this paper we show that for any Borel measurable local effectiveness 
function, incrementally optimal allocations are totally optimal whenever the target's probability distri- 
bution is given by a density function as in (1.1). In the case where the search space is countable, we 
prove that concavity of the local effectiveness function guarantees that incrementally optimal alloca- 
tions are totally optimal. In addition, it is shown by counterexample that this property need not hold for 
countable search spaces if the local effectiveness function is not concave. 

A search plan is called uniformly optimal if it maximizes the probability of detection at each instant 
during the search. In section 3, we show the existence of uniformly optimal search plans under addi- 
tional hypotheses which are given there. 

Our results hold in a more general situation than that of search theory. Thus, we introduce the 
following framework which is substantially the same as that in [6], one difference being that we deal 
only with Borel functions. Let R be a Borel subset of Euclidean rc-space. We fix Borel functions L and 
U with L =£ U which are defined on R. The functions L and U may take infinite values. 

Define fl={(x, y) :xeR, \y\ < °° and L(x) ^y^U(x)}. We fix a real-valued Borel function e 
defined on ft and the family H of a.e. (with respect to Lebesgue measure) real-valued Borel functions 
q defined on R such that L^q^U . For qes. we understand e(-, q(-)) to be a function from R to the 
reals. Define 

4> = En{(7 : e(-, <?(•)) and q are integrable}, 

and let 

E{q) — I e(x, q(x))dx and C{q) = j q{x)dx for ge<P 
Jn J* 

All integration is Lebesgue integration. A q*e<f> is said to be optimal if 

E (q* ) = max{£ (q) : 9 e<D and C(q) = C(q*)} . 

In the case where L(jc) = 0, £/(*) = °° for xeR and e(x, y)=f(x)b{x, y) for (x, y)eft, E(q) be- 
comes the probability of detecting the target with allocation q and C{q) becomes the amount of effort 
required by q. Then an optimal q* maximizes the probability of detection which can be obtained with 
effort C(q*). 

A function /defined on the real line is said to be increasing if y 3* x implies/(y) ^f(x). A function 
/is said to be concave if for all x, y in the domain of /, f(ax+ (1 — a)y)) 3* ctf(x) + (1 — ot)f(y) for 


For i = l, 2, . . .. let <7,-€<I> be such that qi =£ q 2 *£ . . . . Let qo = L. If 


E(qi) = max {E(q) : q ^ q\-\, qe<l> and C{q) = C (<?,)} for i= 1,2,. . ., 

then we say that (q t , q 2 , . ■ .) is an incrementally optimal sequence. If g, satisfies 

E(qi) = max {E(q) : qe<P and C(q) = C(q t )} for i= 1,2,. . ., 
then (<7i, q 2 , . . .) is said to be a totally optimal sequence. Define 

f(x, y, A) = e(x, y) — Ay for — °° < A < oo, and (x, y)eft. 

The function € is called a pointwise Lagrangian in [6] and A is a Lagrange multiplier. 

THEOREM2.1: Let (<7i,g 2 , . . .) be an incrementally optimal sequence such that for i— 1,2, . . . 
|£(4i)| < °° and C(qi) is in the interior of the range of C. Then (<?i, q 2 ■ . .) is a totally optimal 

PROOF: By the definition of incremental optimality, q x is optimal. Thus, by Corollary 5.2 of [6], 
there exists a real number Ai such that for a.e. xeR 

(2.1) /(*,<?,(*), A,) 3= /(*,y, A,) for \y\<°o and L{x)^y^U{x). 

In other words a necessary condition for qi to be optimal is that it maximize a pointwise Lagrangian 
for some multiplier A*. Similarly, the incrementally optimal nature of q 2 implies the existence of a real 
number A 2 such that for a.e. xeR 

(2.2) <f(x,q 2 (x),X 2 )^f(x,y,\ 2 ) for \y\ < » and qi (x) *S y =S U(x). 

In order to prove that q 2 is optimal it is sufficient to find a real number A such that for a.e. xeR 

(2.3) f{x, q 2 (x), A) &S(x, y, A) for \y\ < oo and L(x) =£ y ^ U(x). 

The sufficiency of (2.3) follows from a well known result concerning Lagrange multipliers (see, for 
example [3], [8] or Theorem 2.1 of [6]). 
By (2.1) and (2.2) 

(2.4) k 2 (q 2 (x)-q l (x)) =£e(*, q 2 {x))-e{x, q^x)) =£ A, (q 2 {x) —q x (x)) for a.e. xeR. 

Recall that q 2 2= q x . If q 2 {x) = q\{x) for a.e. xeR, then (2.3) holds for A= Ai. If q 2 {x) > q t (x) for xina 
set of positive measure, then (2.4) implies that A 2 «S Ai. In this case for a.e. xeR and y such that \y \ < °° 
andL(x) ^ y *£ q\{x) , we have 

0*Se(x, g,(x))-e(x, y) - A, {q>{x) -y) ^ e(x, q l (x))-e(x, y) - A 2 (<7,(*) -y). 
That is for a.e. jcc/?, 

(2.5) /(*, y, A 2 ) ^ /(*, <?,(*), A 2 ) =£ t{x, q 2 (x), A 2 ) for |y| < oo, L(x) ^y^qi(x). 

422 L. D. STONE 

Combining (2.5 ) and (2.2 ) we obtain (2.3 ) with X = \ 2 . Thus, qi is optimal. By repeating the argument 

for 93, q4, • • •■> the theorem is proved. 

We now shift our attention to the case where R is a countable set. That is for some countable 
subset J of the integers, R — { xy.j e J }. Let 

£(«)= £ e(*j, </(*,)) 



Carry over the definitions of incrementally and totally optimal sequences in the obvious way. One may 
use the method of proof given in Theorem 2.1 to show that incrementally optimal sequences are totally 
optimal for the case where R is countable provided that the existence of a real number A such that 

(2.6) S(xj, q*( Xj ), A) = sup {S(x h y, K) : \y\ *s °o and L(x) <y*£ U(x)} for je} 
is a necessary condition for q* to satisfy 

(2.7) E(q*) = max {E(q) : 9 e<P and C(q)=C(q*)}. 

From Corollary 5.3 and Remark 2.3 of [6] we conclude that if e(xj, •) is a concave function foryej, 
then (2.6) is necessary for (2.7). Thus, we may state the following theorem. 

THEOREM 2.2: If R is countable and e(xj, • ) is concave for;'cJ, then any incrementally optimal 
sequence is totally optimal. 

The following example shows that one cannot remove the assumption that e(xj, •) is concave in 
Theorem 2.2. The example also shows that (2.6) is not necessary for (2.7) when e(xj, •) is not concave 
for ji J. 

EXAMPLE 2.3: Let /?={!, 2} be a doubleton set, L = 0, i/(l) = 2, and (7(2) = V3. Define 


e(l,y) = |y 0=£y*£2, e(2,y)=] 

2 y 0*Sy«U 

Note that both e(l, •) and e(2, •) are everywhere differentiable. For *£ T 1 ^ 2+ V3, define 

o^r< V3 

W. i) [t-VB, V3^T^2+V3 
17(2, T)- | ^ V3<r^2+V3. 

Then 17(1, •), £=1,2, is increasing, and for each T 3* 0, C(tj(-, T)) = T and E(r}(-, T)) gives the 
maximum of E(q) over all nonnegative functions q defined on {1, 2} such that C(q) = T. Note that 


for q* = r}(-, 1), (2.6) is not satisfied for any X. An example of a function q* which satisfies (2.7), but 
for which there is no A. satisfying (2.6) is also given in [8]. 

One may check that the sequence of allocations {q\, q 2 ), where qi(l) — l, qi(2) =0and g 2 (l) = 1» 
<jr 2 (2) = l, is incrementally optimal. However, ^(q 2 ) = l and C(q 2 ) = 2, while 

E(ri(-, 2)) = 2-y 2 V3>l 

so that 92 is not optimal, i.e., (gi, q 2 ) is not totally optimal. 

In [2, p. 328] it is claimed that (in our notation) the existence of a function 17 defined on /?X [0, S] 
such that tj(-, T) is an optimal allocation of T amount of effort for each ^ T ^S and r)(x, •) is in- 
creasing for xeR guarantees that incrementally optimal sequences are totally optimal. Example 2.3 
shows that for discrete R this claim does not hold. If in addition to the existence of a function 17 satisfying 
the above conditions we have that for each amount of effort there is an almost everywhere unique opti- 
mal allocation of that effort, then any incrementally optimal sequence is totally optimal. Although not 
stated as such, this result is proven in [2]. 

Example 2.3 shows, of course, that optimal allocations need not be unique. Even when E and C are 
defined as integrals with respect to Lebesgue measure on rc-space as is done in section 1, optimal 
allocations need not be unique. In fact, it is easy to see that if there exists a subset D of R having posi- 
tive measure such that for xeD the graph of e{x, •) contains a nondegenerate straight-line segment of 
slope A., then there are amounts of effort for which an optimal allocation of that effort is not almost 
everywhere unique. 

REMARK 2.4: Let us return to the search situation described in section 1. That is, L(x)=0, 
U(x) = 00 for xeR, e(x, y) =f(x)b(x, y) for(x, y)efi. Suppose that an optimal allocation q\ has been 

performed and that the search has failed to detect the target. Let f\ be the posterior target location 
density given failure to detect the target. Thus 

(2.7) /,(*)= l-E( qi ) 

For xeR, let bi{x, •) be the conditional local effectiveness function at x given that q\{x) search effort 
density was placed at x and the target not detected. Then 

<<yn\ u t \ &(*,<? ! (*) + y) ~b(x, qi(x)) 

(2-8) M*»y)= i — 77 TV\ 

1 — b{x, qi(x)) 

Suppose that h is an allocation of effort which is added onto the original allocation </i, so that the re- 
sulting total effort density is q\{x) + h(x) for xeR. Then 

(2.9) EAh)=f f 1 (x)b l (x,h(x))dx 

is the conditional probability of detecting the target given that allocation qi failed. 

Fix an increment of effort T. Suppose h* has the property that f R h*(x) dx=T and 

E l (h*) = max f £,(M:h^0and f h(x)dx=T\ 

424 L. D. STONE 

Then h* is sometimes called a conditionally optimal search. If we let qz = qi + h*, then we claim 

(<7i, (72) is an incrementally optimal sequence. To see this, we observe that by (2.7) and (2.8), 

ti(h) — — — 

1 -E{q x ) 

Thus maximizing E\ subjected to h 2* and I h(x) dx = T is equivalent to maximizing E subject to 

q*zq\ and C{q) — C{q\) + T. The claim now follows from the definition of incremental optimality, 
and we see that the concepts of incremental and conditional optimality coincide for searches of the type 
discussed in this paper. Hence, under the conditions of Theorem 2.1 or 2,2, a sequence of conditionally 


optimal searches (hi, h 2 , . . .) produce, by setting qt = >) n k a totally optimal sequence (qi, q 2 , . . •) 


of search allocations. 


In this section we find conditions under which uniformly optimal search plans or allocation sched- 
ules exist. More precisely let J be an interval of real numbers. Then an allocation schedule over J is a 
Borel function 7} defined on R X J such that for TeJ, tj(-, T)e<P and for a.e. xeR, T)(x,-) is increasing. 
An allocation schedule tj is uniformly optimal if 

(3.1) C(i)(-,T)) = T and E(r)(-,T)) = max{E(q) :C(q) *iT} {orTeJ. 

This definition is a generalization of the definition of uniform optimality for search plans given by 
Arkin in[l]. In the special case where E(q) gives the probability of detection resulting from the allocation 
of search effort, q, we call 17 a search plan. Then a uniformly optimal search plan maximizes the probabil- 
ity of detection at each instant during the search. In order to prove the existence of such allocation plans 
we define a notion of coverability similar to the one in [7]. 

Suppose p is a real- valued function defined on an interval J of real numbers. If p is concave, then 
throughout the interior of its domain, p' exists a.e. and is decreasing. Moreover, if p is continuous, then 

p(t) — p(s) = I p' (r)dr for s, tej. By an extreme point of a concave function p, we mean a point on 

its graph which does not lie on a chord joining two other points on the graph. 

Define m(x,-) to be the minimal concave majorant of e(x,') for all xeR for which such a majorant 
exists. We say that m covers e if the following conditions are satisfied, 
(i) For a.e. xeR, m(x,-) exists and is continuous, 
(ii) m is a Borel function. 

(iii) e(x,y) = m(x, y) whenever (y, m(x,y)) is an extreme point of m(x, •)• 
Note that condition (iii) is equivalent to assuming that e(x,-) is upper semi-continuous at y such that 
(y, m(x,y)) is an extreme point of m(x,-). For ge<P we define 

M(q) = \ m(x, q(x))dx 

whenever the integral on the right exists. 


Differentiation is always with respect to the last component of the argument, and is denoted by a 
prime, e.g., for (x, y)eil, e (*, y) = lim [e(x, y+ h) — e{x, y)]lh. Let m cover e. If a function </€<P and 


a real number k satisfy, for a. e. xeR , 

m (x, y) ^ k for a. e. y such that L(x) < y < q{x) 

m' (x, y) *S: \ for a. e. y such that q(x) < y < U{x) , 

then we say that the pair (q, k) satisfies the Neyman-Pearson inequalities. When e(x,-) and m(x,-) are 
increasing and U(x) = °°, it is convenient to define 

e(x, °°) =lim e(x, y) and m(x, °°) =lim m(x, «>). 

Before proceeding with our main existence result, we prove two lemmas which will be useful in 
this section. Lemma 3.1 relates closely to Theorem 1 and Remark 3 of [5] . 

LEMMA 3.1: Let m cover e. If there is a q*e<$> such that E(q*) > — °° and k S* such that for 
a. e. xeR 



(i) m'(x,y) 5* k for a. e. y such that L(x) < y < q*{x) 

(ii) m' (x, y) ^ k for a. e. y such that q*(x) < y < U(x) 

(Hi) e(x, q* (x)) = m(x, q*(x)), 

(3.3) E(q*) = max {E(q) : C(q) ^ C(q*)}. 

PROOF: By (3.2) (iii), M(q*) exists. It is an easily shown Neyman-Pearson result (see Theorem 
1 of [5]) that for k 5* 0, (i) and (ii) imply 

(3.4) M(q*) = max {M(q):C(q) ^ C(q*)h 

Suppose that there is an re<I> such that E{r) > E{q*) and C(r) *£ C(q*). Since m majorizes e, we have 

M(r) ^E(r) >E(q*) = M(q*), 
which contradicts (3.4). This proves the lemma. 

For k > and x such that m(x,-) exists, we define 

if u (x, k) = sup {y : y = L (x) or m! (x, y) 3* k) 

<Pf(x, k) = inf {y.y = U{x) or m' (x, y) =£ k }. 
Then for k > 0, we let 

l,(k)= I <Pf(x, k)dx, I u (k) = \ <p u (x,k)dx. 

426 L. D. STONE 

The functions ip ( and <p u will be our main tools for constructing solutions to the constrained extremal 
problems considered here. The following lemma displays some of the properties of these functions. 
I ..EMMA 3.2: Suppose m covers e and for a.e. xeR, e(x,-) is increasing. If— « < E(L) =£ E(U) 
< oo and |C(L)| < °°, then the following hold: 

(a) (£>„(•, X)e<I> and tp f (-, X)e<I> for X > 0. 

(b) /^,and /„ are finite and decreasing. 

(c) <ps(x, •) and I ( are right continuous and <p u (x, ■) and /« are left continuous. 

(d) ((f{(-, X), X) and (<p u {-, X), X) satisfy the Neyman-Pearson conditions. 

(e) A pair (q, X) , where <?e4> and X 2= 0, satisfies the Neyman-Pearson inequalities if, and only if, 

<p/(x, X) =£ q(x) *£ <p u (x, X) for a.e. xeR. 

(f) For any X > 0, we may find a Borel function a defined on R X [/^(X), / U (X)], such that 

(1) a(x, •) is increasing for a.e. xeR, 

(2) C(a(-, T)) = TforI, (X) < T*£ J U (X), 

(3) (a(-, T) , X) satisfies the Neyman-Pearson inequalities for all/^(X) =£ 7 1 =£ / U (X). 

(g) lim /„(X)=C(L). 

(h) For X > and x such that m(x, •) exists, (<p u (x, X), m(x, <p u {x, X)), and (^>,(x, X), 
m(x,<p ( (x,\)) are extreme points of m(x, •)• 

PROOF: A straightforward verification shows that (£v(\ X) and <£>„(•, X) are Borel functions for 
each X > and that (a) holds. Thus, the integrals, //(X) and 7 M (X) are well defined for each X > 0. 

For a.e. xeR, the following hold. Since e(x, •) is increasing, m{x, •) is increasing. If U{x) is finite, 
then (U (x), m(x, U(x))) is an extreme point and m(x, U(x)) — e(x, U{x)). If U(x) =», then the in- 
creasing nature of e(x, ■) and the minimal nature of m(x, •) yields m(x, o°) — e(x, °°). Since |C"(L)| < «», 
L(x) is finite and m(x,L(x)) = e(x,L(x)- ). 

To prove (b) , we observe that 

- oo < E(L) = M(L) *s £(£/) = M(U) < oo. 
Thus, for a.e. xeR, m{x, L(x)) and m(x, U(x)) are finite. 
Since m(x, •) is increasing, we have for a. e. xeR, 

m(x, U(x))-m(x,L(x)) 2* f* m'{x,y)dy^ (z- L(x)) m' (x, z) for L(x) <z<U(x). 

Thus, m (x, z) =£ (m(x, U(x)) -m(x, L(x)))l{z- L(x)), and it follows that 

ip u {x, X) ^f [m(x, U(x))-m(x,L(x))]+L{x) forX>0. 




oo <C(L) ^^(XJ^/^X) *£ 7 [M(£/)-M(L)]+C(L) <oo forX>0 


which proves that If and I u are finite. The decreasing nature of m' (x, •) for a. e. xeR guarantees that 
^ M (*» *) SLndtp/ {x, •) are decreasing for a. e. xeR. Thus, (b) follows. 

The left continuity of <p u (x, ') and the right continuity of <p({x, •) for a. e. xeR follow from their 
definitions and the decreasing nature o( m' (x, •). The monotone convergence theorem and the finite- 
ness of /« and / e may be used to show the left and right continuity of /* and // , respectively. Thus, (c) 

Properties (d) and (e) follow directly from the definition of <p{ and <p u . In order to prove (f), we use 
a device of Arkin's [1] and define for =£ s =£ °° 

((fu(x, \) if |*| < o 

<p,(x, A) if |«| 35 5 


Hk(s) = /ik(x, s)dx. 


By the monotone convergence theorem, H\ is continuous. Moreover, H\ is increasing and 

//x(0)=MA), //x(»)=/ u (X). 

Thus for //(A.) ** T «£ I u (k), we may choose £(T) such that H\(t;(T)) = T. Defining a(x, T) 
— h\(x, i(T)) for xeR, we see that a satisfies conditions (1) and (2) of (f)- Condition (3) follows from 
(e). Property (g) follows easily from the monotone convergence theorem and the definition of <p u . Prop- 
erty (h) may be verified from the definitions of <ps, <p u , and an extreme point. This completes the proof. 

THEOREM 3.3: Suppose m covers e, and for a. e. xeR, e(x, •) is increasing. If — °° < E(L) < E(U) 
< °° and |C(L)| < °°, then there exists a uniformly optimal allocation schedule tj over [C(L), C(U)). 

PROOF: We consider first the case where / M (0) = lira I U (X)=C(U). The case 7„(0) < C(U) 
requires only routine modifications which are discussed at the end of the proof. We take7j(-, C{L)) =L. 

Since I u is monotone, it has only a countable number of discontinuities. Let K be a countable 
index set such that {kk'.keK} is the set of discontinuity points of /„. Let J k — U/>(kk), Iu(kk] for 
keK. The intervals J k are disjoint and are the jump intervals at the discontinuity points of /„. For 

T € (C(L),C(U))- U 7*,let 


X*(7-) = sup{\ :I u (k) = T}. 

By the left continuity of /„, I u (k*{T)) = T. 

For TcJk, let X*(7') = A* and choose a function a* defined on R X/ t to have the properties of 
a in (f) of Lemma 3.2. Define 

r <p u (x,\*(T)) \iC(L)<T<C{U) andT* uA 
l atk(x, T) if TeJk and keK. 

428 L. D. STONE 

Then for each C(L) <T< C(U),C(t)(-, T)) = T and (t/(-, T), X*(T)) satisfies the Neyman-Pearson 
conditions. Since m covers e and property (h) of Lemma 3.2 holds, we have that for each C(L) < T 
< C{U), e{x, 17 (x, T)) = m{x, t}(x, 7')) for a.e. xeR. Thus, by Lemma 3.1, 17 satisfies 3.1. 

To verify that 17 (x, •) is increasing for a.e. xeR, we let R' be the set of xeR such that m(x, •) 
exists. Then by the fact that m covers e, R — R' has measure 0. Suppose it is not the case that rj(x, • ) is 
increasing for a.e. xeR. Then there is an xeR' and numbers S and T, such that C(L) < T < S < C(U) 

(3.4) r ) (x,S)<r ) (x,T). 

Since (tj(-, T), k*(T)) and (tj(-, S), \*(7')) satisfy the Neyman-Pearson inequalities for all xeR' , 
we have 

A*(7) *£ m' (x, y) ^ X*(S) for a.e. y such that -q(x, S) < y < f){x, T). 

One may check that A* is a decreasing function, so that A* (7*) = A*(S). Thus, for some keK, T and 
S are both in the closure of Jh,- However, w(x, •) is constructed by property (f) of Lemma 3.2 to be 
increasing on the closure of A-. This contradicts (3.4) and proves the theorem for the case where 
I u (0)=C(U). 

If/„(0) < C(U), we proceed as before for C(L) < T < /«(0). We then define 

<p tt (x, 0) = lim <p u (x, A). 

From the increasing nature of e(x, •) , it follows that if gc4> and if q(x) ^ <p u (x, 0) for x€R, then q will 

satisfy (3.2) with A = 0. Hence, to complete the definition of 7)(x, •) for 7«(0) < T < C(U), one need 

only choose 17 so that 7}(x,T) 3* <p„(x, 0) and C(t/(-, T)) = T which may be easily done. This completes 

the proof. 

Observe that the hypotheses of Theorem 3.3 may be weakened to require that m(x, •) rather than 
e{x, •) be increasing for a.e. xeR. The theorem remains unchanged except that 17 must be restricted 
to [C(L),/„(0)]. This is no real restriction since for q ^ (p u {\0),E(q) =£ E(<p„(-,Q)). 

Theorem 2 of Arkin [1] is similar to Theorem 3.3 above with the exception that [1] claims that 
there exists a function fi such that 

17U, T) =Vp(x,s)ds, C(t,(,T)) = T, 

and 17 is uniformly optimal. However, the following is a counterexample to Theorem 2 of [1]. (More- 
over, the proof in [1 1 is not sufficient to show the truth of Theorem 3.3.) 
Let/?= [0, 1],L = 0, e/ = oo, and 



for xeR. It is clear that any uniformly optimal search plan 17 must have the property that for a.e. xeR, 
■q(x, •) jumps from to 1 at some point T, but there is no function /3 which produces this behavior for 17. 


Under the conditions of Theorem 3.3. we have shown that there exists, for any C(L) ^T< C(U), 
a q* such that C{q*) = T and E(q*) = max {£(</) : ge<P and C(q) =£ T}. Theorem 8 of [7] provides a 
similar existence result whenever m covers e and — °° < C{L) =£ C(U) < °°. In comparison, Theorem 
3.3 of this paper removes the restriction that C{U) < °°, but adds monotonicity conditions on e(x, •) 
and boundedness conditions on E. In [6] there is also a discussion of related existence theorems. 

One might conjecture that Theorem 3.3 would remain true without assuming that e(x, •) is in- 
creasing, provided that we assumed \E{q) \ < B for some number B and all qe<5>. Similarly, one might 
conjecture that the restriction C{L) > — oo could be omitted. However, the following two counter- 
examples show both of these conjectures to be false. 

EXAMPLE 3.4: Let« = [1, °°), L = 0, and U(x) =x+ 1/x 2 for xeR. 
For xeR, define 

r y, *£ y =£ 1/x 2 

e(x, y)= < 

1 y-1/* 2 

r 2 -v3 

l/* 2 <ys=*+l/* 2 . 

Note that m = e and that \E(q) | ^ 1 for all qe<P. Suppose q* is optimal and 

°o>C(<7*) >1. 

By Corollary 7.2 of [6] there exists a X such that 

(3.5) e(x,q*(x))-kq*(x) = sup{e(x,y)-\y:0^y<x+llx 2 } for a.e. xd?. 

Since e(x, •) is concave for xeR, this implies 

e'(x,y)^k iorO<y<q*(x) 

e'(x, y) =£ X for q*{x) < y < x + Ijx 2 , for a.e. xeR. 

One may check that if X ^ 0, C(q*) ^ 1. Thus, the above X must be negative. It follows from the 
above inequalities that q*{x) = x + 1/x 2 for x 3 > — 1/X. Thus, C(q*) = °° which contradicts our assump- 
tion that C(q*) < <». Thus, one cannot replace the monotonicity of e(x, •) by boundedness of E in 
Theorem 3.3. 

EXAMPLE 3.5: Let ft = [1, °°), L = - 1, tf= l.and 

e(x, y) = ylx 2 for — 1 =£ y *£ 1. 

Observe that e = m and all the conditions of Theorem 3.3 are satisfied except that C(L) = — ». Suppose 
q* is optimal and C(q*) is finite. Again by Corollary 5.2 of [6] there exists a X such that 

e{x, q*{x)) — kq*(x) = sup {e(x, y) — Xy: — 1 *£ y ^ 1} for a.e. xeR. 

430 L. D. STONE 


It follows that 

e'(x,y)&\, -Ky<q*(x) 
e'(x, y) «S k, q*(x) < y < 1, for a.e. xdl. 
1 for a? < 1/X 

- 1 for x 2 > 1/X. 

q*(x) = l 

Hence, either C(q*) = — » or C(q*) = + oo contrary to the assumption that C(q) is finite. Thus, we 
cannot omit the condition C(L) > — °° in Theorem 3.3. 

REMARK 3.6: In the search theory case where L(x) = 0, U(x) — °° for xeR and e(x, y) =f(x) 
b(x, y) for (x, y)efl, the conditions of Theorem 3.3 will be satisfied if b{x, •) is right continuous for 
xtR. Since b(x, •) is increasing and \E(q) | =£ 1, for ge4>, the only condition that is not obviously sat- 
isfied is the coverability condition. However, since b(x, • ) is increasing and right-continuous, it is upper 
semi-continuous. Thus, e(x, •) has a minimal concave majorant m(x, •) which is continuous, and one 
may check that e(x, y) = m(x, y) whenever (y, m(x, y)) is an extreme point of m(x, •). It can be 
shown that since e is Borel, m is a.e. equal to a Borel function. Thus the conditions for coverability 
are satisfied. It follows that whenever the local effectiveness function is right continuous and the 
target location distribution is given by a density function on Euclidean re-space, a uniformly optimal 
search plan exists. Note that uniformly optimal search plans may be used to produce sequences 
which are both incrementally and totally optimal. 


[1] Arkin, V. I., "Uniformly Optimal Strategies in Search Problems," Theor. Probability AppL 2, 

674-680 (1964). 
[2] Dobbie, James M., "Search Theory: A Sequential Approach," Nav. Res. Log. Quart. 4, 323-334 

(Dec. 1963). 
[3] Everett, Hugh, "Generalized Lagrange Multiplier Method for Solving Problems of Optimum 

Allocation of Resources," Operations Res. 77, 399-417 (1963). 
[4] Koopman, B. O., "The Theory of Search: HI. The Optimum Distribution of Searching Effort," 

Operations Res. 5, 613-626 (1957). 
[5] Wagner, D. H., "Non-Linear Functional Versions of the Neyman-Pearson Lemma," SIAM Rev. 

7, 52-65 (June 1969). 
[6] Wagner, D. H. and L. D. Stone, "Necessity and Existence Results on Constrained Optimization 

of Separable Functionals by a Multiplier Rule," To appear in SIAM J. Control, 12, (1974). 
[7] Wagner, D. H. and L. D. Stone, "Optimization of Allocations Under a Coverability Condition," 

To appear, SIAM J. Control, 12, (1974). 
[8] Zahl, S., "An Allocation Problem With Applications to Operations Research and Statistics," 

Operations Res. 77, 426-441 (1963). 


Robert Thomas Crow 

School of Management 
State University of New York at Buffalo 


Many Naval systems, as well as other military and civilian systems, generate multiple 
missions. An outstanding problem in cost analysis is how to allocate the costs of such mis- 
sions so that their true costs can be determined and resource allocation optimized. This 
paper presents a simple approach to handling this problem for single systems. The approach 
is based on the theory of peak-load pricing as developed by Marcel Boiteux. The basic 
principle is that the long-run marginal cost of a mission must be equal to its "price." The 
implication of this is that if missions can cover their own marginal costs, they should also 
be allocated some of the marginal common costs. The proportion of costs to be allocated is 
shown to a function of not only the mission-specific marginal costs and the common marginal 
costs, but also of the "mission price." Thus, it is shown that measures of effectiveness must 
be developed for rational cost allocation. The measurement of effectiveness has long been 
an intractable problem, however. Therefore, several possible means of getting around this 
problem are presented in the development of the concept of relative mission prices. 


This paper is an attempt to provide a new method of allocating the common costs of new invest- 
ments in a multi-mission system to individual missions in a way that is (1) operational, (2) objective, 
and (3) defensible from the point of view of efficient resource allocation. t The most important reason 
for allocating common costs to individual missions is to provide guidance for procurement of systems 
and to estimate the costs of accomplishing given missions by alternative systems. If common costs are 
properly allocated, it is possible to estimate the true costs of accomplishing a mission with one system 
compared to another. 

For an illustration of the problem, consider Table 1. Five systems are shown that can be combined 
to accomplish three missions. Systems A, B, and C are single-mission systems and therefore, by defini- 
tion, have no common costs. Systems D and E are multi-mission systems characterized by significant 
common costs, as well as incremental costs which are specific to each mission. How much does it cost 
to accomplish the missions by the use of multi-mission systems? Which systems— single-mission, 
multi-mission, or some combination — should be procured? Obviously, military systems problems 
are too complex to be accurately characterized by such simple questions. Yet it appears that in day-to- 

*The work on which this article is based was performed for the Chief of Naval Operations, Systems Analysis, as part of 
Contract No. N00014-70-C-0086 with Mathematica, Inc. This paper is a revision of portions of a report [4] prepared for that 

t Common costs are defined as those which are incurred by a single system regardless of which of a number of missions is 
being performed. They may arise from either operation or investment. It should be clear that, since the subject is investment 
decisions, it is the incremental common costs that are to be allocated. 


432 R- t. crow 

day operations, they are often posed in this fashion — for first-approximation purposes, if not for final 


TABLE 1. Comparison of Costs of Single- and Multi-Mission Systems 


































As an example of the consequences of misallocating common costs, if in a given system all common 
costs are allocated to a particular important mission, it may appear that the system's capability in 
that mission is costly relative to other systems and it may not be purchased. Even if it is purchased, 
it may be underutilized. On the other hand, other mission capabilities of the system may appear to 
be less costly than they are in fact, leading to overpurchase or overutilization. 

In general, the allocation of common costs has been avoided by economists. Most work on common 
costs has focused on short-run marginal cost analysis and hence only on the costs that are variable 
for each specific output (or mission, in our context).* This focus on short-run problems side-steps 
the issue of how to handle common costs in investment decisions, where there are normally several 
alternative courses of action and no costs are fixed. For pricing and other types of decisions, in both the 
private and public sectors, if other than short run marginal or incremental prices are considered, 
reliance has usually been placed on arbitrary allocations of common costs to some or all outputs. This 
was the general rule until 1949, when Marcel Boiteux wrote an ingenious and basic article on peak- 
load pricing for electrical utilities, translated into English in 1964 (Ref. [2]). In the approach presented 
here, two principles will be emphasized: (1) (following Boiteux) efficient allocation of common 
costs can and must be based on "marginal" conditions; and (2) efficient allocation must be based 
explicitly on considerations of how well a given system performs a given mission, as well as an assess- 
ment of the need for capability in that mission. 


A basic theorem of resource allocation is that in a competitive economy the maximation of output 
in equilibrium occurs when price (P) equals the cost of producing one additional unit of output (mar- 
ginal cost or MC). The P = MC output can be proven to be optimal if competitive conditions hold 
throughout the economy (assuming the existence of U-shaped average cost curves). This is a funda- 
mental principle in the ensuing discussion on cost allocation. 

*For an example, see Ref. [3, ch. 5]. Professor William Baumol has pointed out in correspondence that there have been a 
number of recent exceptions to this general rule for civilian systems, such as utilities. In particular, see Ref. [6]. 


Furthermore, it is necessary to recognize that before any costs are sunk, they are all variable. 
Thus, in planning for investment, the criterion that price must equal marginal cost means that the 
marginal cost measure must include investment cost. Assuming for the moment that a set of "prices" 
can be established for the value of accomplishing a military mission, the widely used criterion of price 
equal to short-run marginal costs (marginal operating costs) is only valid as the minimum price at 
which a particular system should be used in a given mission. In other words, for a given system, if the 
mission is not needed sufficiently to be worth sacrificing enough resources to pay for its marginal 
operating costs, then the mission should not be performed. 

An example of the possible consequences of using only short-run marginal costs in system decisions 
is shown in Figures 1 and 2. The amount of output of each two systems is q and the vertical distances 
Of i and Of 2 represent investment costs, while the vertical distance of the shaded areas represent 
marginal costs. In this case, it is clear that short-run marginal costs are lower for system 2 than system 
1; however, in the long run, the total cost of system 1 is less because its investment costs are lower. 
Therefore, it should be chosen in spite of its higher short-run marginal costs. 



Figure 1. Output with high marginal operating costs and low marginal investment costs, System 1. 




Figure 2. Output with low marginal operating costs and high marginal investment costs. System 2. 

434 R. T. CROW 

In order to include investment costs in the marginal cost measures, it is necessary to distinguish 
long-run from short-run costs. Long-run costs represent situations where output is variable through 
investment, rather than by changing utilization of existing capacity (which can occur over the short- 
run). The question is how to reconcile the short-run condition of price equal to marginal cost with 
the necessary condition of long-run marginal costs being covered. The solution is deceptively simple: 
for a given objective to be achieved, purchase that number of units of the system at which the price 
of each unit equals both the short-run and long-run marginal costs. It is necessary to demonstrate this 

Real systems often are relatively inflexible in their capability. That is, after a point, significant 
increases in operating costs will expand output very little, i.e., the cost curves have an "elbow" where 
they become vertical when capacity is reached. It can be proven that for a family of such short-run 
curves, representing either expanded units or additional units, the envelope of the short-run curves 
is the long-run total-cost curve. This is illustrated in Figure 3. This is the case that will be dealt with 


q, q 2 q 3 q 4 q 5 q 6 q 7 
Figure 3. Relationship of long-run to short-run total costs. 

here in demonstrating that optimum resource allocation calls for equality of short- and long-run mar- 
ginal costs. t In the ensuing discussion it will be useful to identify short-run costs with operating costs, 
and long-run costs as operating plus investment costs. In dealing with a system with very little flexi- 
bility, we are able to illustrate a means of approximating the solution for the case in which the system 
is completely inflexible. 

The following notation will be used: 

C: total cost variable over the short run (only operating costs) 

C : total cost variable over the long run (operating plus investment costs) 

Cf. total investment cost (fixed in the short run) 

*A rigorous treatment is presented in an appendix to Ref. [2]. 

tThis can be relaxed, but at the sacrifice of simplicity in exposition a.. J application. See appendix to Boiteux [2]. 


q: output of a system 

q : capacity which meets some given requirement 

z: marginal operating cost (variable in the short run) = dCjdq 

x: marginal investment cost (variable in the long run) = dC/ldq 

y: marginal long run cost = dC /dq 

w: an approximation of z for q< q a , defined by (C — C/)lq, i.e., the average avoidable cost. 

For q < q , let the total cost function be* 

(1) C = C f +wq. 

Over the long run, adjustment can be made to the required capacity (i.e., investment can be varied). 
Therefore (1) can be expressed as 

(2) C = Cf(q ) + wq , 
and, assuming that w does not vary with the number of units, 

a\ dC dC f 

(3) = (q ) + w = y. 

dq dq 
That is, the long-run marginal cost of a number of units is 

(4) y=x + w, 

by the definitions of y, x, and w, i.e., the long-run marginal costs are equal to the sum of the marginal 
investment cost and the average avoidable cost. 

If there is true inflexibility at q , short-run marginal cost is indeterminate. However, if there is a 
slight bit of flexibility, z becomes very large as q — * q* Therefore, it becomes equal to y at some point on 
its vertical arm. At this point, long-run and short-run marginal are equal, which establishes the solution 
of price equal to long-run and short-run marginal costs, 

(5) p = y=x + io=z. 

This is illustrated in Figure 4. 

The Basic Principle of Common Cost Allocation 

Assume that a given system can perform two missions, and its investment cost is common to both. 
That is, there are no characteristics of the system that can be attributed uniquely to one mission or 

* This section follows Boiteux, [2]closely. Any errors are likely to be mine, not his. 

t Linearity is assumed for convenience and because in applied work one often has only linear approximations in any case. The 
linearity assumption does not appear to have a material effect on the analysis. 



Figure 4. Equality of short-run and long-run marginal costs at capacity output. 

In the case of two missions being performed by a single system, the cost functions for each of them, 
with flexible capacity, are 


C, = C/(q ) + Wiqi 

C 2 = Cf (q ) + w 2 q 2 , 

where q\ and q 2 represent the output of missions 1 and 2, respectively. Given a particular output 
requirement, say q u the requirement for optimum resource allocation is that the marginal long-run cost 
for both missions together be minimized, i.e., 


a(C + c 2 ) 


= 0. 

To establish the equality of long- and short-run marginal costs, consider the differential of C t + C 2 , 

d{Cl + Ci) = ~dq~ t qi + ~dQ~2 dq2 + 2 BQo dQ °- 

The first two terms on the right hand side are short-run marginal costs, z\ and z 2 , times their respective 
variations in output. The third term must be equal to zero for cost minimization to hold. 
The differential d{C\ + C 2 ) may also be written as 


dC\ + dC 2 = - — dq + Widqi + - — dq + w 2 dq 2 
loq J ldq J 

where dq x — dq% = dq * (9) may be divided by dq to yield 



dCi + dCi n dC/ 

j = 2~ h W\ + W2 . 

dq aq 

Since dCf/dq has been defined above as x, the optimum condition of equality of short- and long-run 
marginal costs is 


z\ + z% — 2x + Wi + wi. 

For the prices of two missions to be equal to the long-run marginal costs of a given system capacity, 


Pi + Pi. = 2x + w\ + w 2 . 

The allocation of common investment costs follows immediately from (12) which implies, 


(pi - w x )l2x + (p 2 - w 2 )/2x = 1. 

That is, the share of common costs to be allocated to each mission performed by the system is equal 
to the corresponding term in (13). 

Before turning to the problem of how the prices of missions are to be determined, it is instructive 
to consider two particular cases. One is the case where one of the missions (mission 2, for example) 
has a price such that it is just worth its marginal operating costs, p-z = w. In this case all common 
costs are allocated to mission 1 by (13). This is illustrated in Figure 5. 

q(;q ) LEVEL 

Figure 5. One system bearing all common costs and the other system bearing only operating rosts. 

"Boiteux [2, appendix] has shown that this is likely to be a reasonably good approximation over a wide range of conditions. 



It is important to note that this is a generalization of the "two-ship" method of allocating invest- 
ment costs currently in use (Grey, [5] pp. 2, 3, sec. 2-4). In this method, "major" missions are al- 
located the entire common investment cost of the system plus their incremental investment costs, while 
"minor" missions are allocated only their incremental investment costs. 

In the case shown in Figure 6, the problem is different in that each mission's price is such as to be 
able to cover at least a portion of its common marginal investment costs. In this case, the question 
arises: Is one mission allocated only operating costs and the other allocated both its operating and the 
common investment costs? And further, if the common investment costs are shared, how are they di- 
vided? The answer is that since pi and pz both exceed w (assumed to be the same for both missions to 
keep the diagram simple), common costs must be allocated according to (13), where both terms will be 
greater than zero. Thus each mission will bear a share of the common cost. 





-^-— -""■"""o 

P 2 



q(=q ) 
Figure 6. Both systems bearing common costs. 



Military cost-effectiveness analysis generally takes one or another of the following forms: (a) 
maximize effectiveness subject to a cost constraint, or (b) minimize cost subject to achieving a given 
level of effectiveness. In this paper, attention will be devoted to the latter, but the basic approach is 
also applicable to the former. 

As seen from the discussion above, it is critical to be able to set the prices of various missions. 
At first, this requirement appears unorthodox and difficult; but, in fact, it is not far from existing prac- 
tice. In the case of a single system, the price of one unit of the system (say one plane), plus the wages of 
personnel, the expenditure for fuel, and whatever else is necessary for the unit to perform its mission, 
can be considered to be the price of the mission. Expressed somewhat diffc ently, the amount of ex- 
penditure necessary to meet a given mission requirement divided by the units of a system which must 
be used is the price of that mission as performed by a particular system. 



The Alternative System Method 

Consider two single-mission systems which perform different missions. The expenditures on these 
two systems are shown in Figures 7 and 8. In order to meet effectiveness requirements for each, r/i 
and<?2 units are required respectively. To meet these requirements, expenditures oiOpiCiqi and Op-zCzqi 
are necessary, implying mission prices of pi andp2- 

If a third system performed both missions jointly, it might be procured in such quantity (93) that 
it met the requirements of mission 1 and part of the requirements of mission 2. This is shown in Figure 9. 
If we assume that the operating costs of system 3 are identical to those of systems 1 and 2, and the in- 
vestment costs are exactly the same for system 3 as for the sum of systems 1 and 2, then, at the optimum, 
the prices of the missions for the multi-mission system are equal to those of the single-mission systems. 

Pi + P2 = 2 (x 3 + w). 

This follows from (12), and allocation of common costs (investment costs in this case) follows (13). 
That is, in this particular case the shares of common costs differ solely because of the difference in 
mission prices. 





FIGURE 7. Average price and required units of system in Mission 1 (alternative system method). 



FIGURE 8. Price and required units in Mission 2 (alternative system method). 




Figure 9. Costs and prices of both missions with a single system. 

Clearly the assumptions of identical costs is unrealistic. There is no reason to assume that a multi- 
mission system will cost precisely the same as the sum of two single-mission systems. In fact, if it did 
there would be no apparent reason to buy it. The only apparent reason for a multi-mission system is 
that it meets a set of requirements less expensively than single-mission alternatives. Thus, to preserve 
the equality of prices and marginal costs for the multi-mission system, it is necessary to scale either the 
marginal costs or the prices. Since it makes no difference which is chosen, prices are scaled: 

P\3 + P23 = n(p l +p 2 ) = 2(x 3 + w), 

where P13 and p 2 3 are the prices of missions 1 and 2 as performed system 3, and 


n = 

2(x 3 + w) 

(Xi + W) + (X 2 + W) 

from the conditions 

Pi = x\ + w and p-2 = X2 + w, 

established in (5). Thus, the proportions of common investment cost (for example) allocated to different 
missions are the respective terms of 

(npi — w)l2x 3 + (np 2 — u,l2x3 = l. 

Next, consider the systems where certain investment and operating costs can be attributed to 
specific missions. The following notation will be used: 
P13 = price of mission 1 performed by system 3 


P23 = price of mission 2 performed by system 3 

u>i3 = an approximation of marginal operating costs for mission 1 performed by system 3 (see 

definition of w prior to Equation (1)). 
W23 = an approximation of marginal operating costs of mission 2 performed by system 3. 
Wz = an approximation of marginal operating costs of system 3 common to missions 1 and 2. 
2 13 = marginal operating cost of mission 1 performed by system 3 
223 = marginal operating cost of mission 2 performed by system 3 
*i3 = marginal investment cost of mission 1 performed by system 3 
£23 = marginal investment cost of mission 2 performed by system 3 
x 3 = marginal investment cost of system 3 common to missions 1 and 2 

The critical condition for optimum resource allocation in the simple case presented in (11) was 

z\ + z-i = 2x + w 1 + w-i. 

If there are investment costs and operating costs common to each mission, as well as investment costs 
and operating costs specific to both missions, the specific costs are directly attributable to their respec- 
tive missions. The condition can be written as 

Zl3 + Z23 = 2(*3 + W 3 ) + *13 + W13 + *23 + ^23, 

which preserves the equality of short- and long-run costs for the system. Since the condition for optimum 
resource allocation is that output is such that price equals marginal cost, 

P13 = Z13 and p 2 3 = Z23. 


(15) Pl3 + P23 = 2 (X3 + ">3) + *13 + Wl3 + *23 + ^23 


(Pl3 — W13) + (P23 — *23 — M>23) = 2(* 3 + m) 

which implies that allocation of common investment and operating costs follows 

(16) [(Pl3-*13- Wl3)/2(X 3 + M;3)] + [(P23-*23-M>2 3 )/2(x 3 + U;3)] = 1. 

Of course, in the case where pi andp 2 are given from single-mission systems, then 

P13 = npi and p 2 3 = "P2 
from (14). 

442 R T. crow 

Knowledge of Historical or Simulated Trade-offs 

If historical data, e.g., Viet Nam, Korea, or World War II, is relevant to the missions in question, 
perhaps some trade-offs can be established. Force structure analysis might also be useful in establishing 
such trade-offs. For example, suppose that as a result of such analysis, a trade-off could be established 
such that the outcome of a campaign would have been the same regardless of whether an amount of 
"output" mi of mission 1 or m-i of mission 2 were provided. Therefore the price of mission 2 relative to 
mission 1 is 

P\ m 2 

Pi = kpi. 
The price of mission 1 performed by system 3 must be such that (15) will hold, that is 

P 13 + kp 13 = 2 {X 3 + W3) + X 13 + «>13 + *23 + ^23, 

which implies 

(17) Pl3= [2(*3 + W 3 ) + JC13 + W13 + X 2 3 + W23]/ (1 + k) . 

Since p 23 = kp\ 3 , bothpi 3 andp23 are determined and allocation of common costs follows (16). 

Expert Opinion 

Expert opinion may be used to establish trade-offs needed for common cost allocation. This appears 
to be the implicit assumption underlying the two-ship method, in which judgment must be made as to 
which mission is "major" and which is "minor." The principal distinction is that under the proposal 
of this paper, if two missions are believed to be approximately equal in importance, common costs would 
be allocated according to relative prices, taking incremental investment costs and operating costs 
into account, rather than allocating all or none of the common costs to each system. If one was thought 
to be slightly more important for the system in question than the other, relative prices might be set such 
that pilp2 = 55/45 and so forth. The setting of an absolute price for one mission would follow (17) and 
the allocation of common costs would follow (16). 

A Simple Example of Mission and System Comparisons 

At the beginning of the paper, Table 1 presented the costs of achieving given amounts of output 
in three missions by five different systems — three single mission and two multi-mission. To illustrate 
the method, consider the basic problems of the paper: (1) which systems are least expensive in carrying 
out their multiple missions, and (2) what are the costs of multiple missions supplied by a single system. 



Since we have provided a single-mission system for each of our alternatives, we use the alternative 
system method of determining mission price. The information of Table 1 and the results of allocation 
are presented in Table 2. In the illustration, it will be assumed that the entries in the table are costs 
per unit and that marginal investment and operating costs are constant. Thus, the entries are approxi- 
mations of marginal costs, i.e., the h/s and x's in the notation above. The functions, such as (6a) and 
(6b) which would yield these parameters could be developed from statistical cost estimating, industrial 
engineering studies, analogy or even expert opinion, and should reflect all of the sources of costs, e.g., 
equipment, fuel, personnel, etc. that make the system operational. 

Beginning with System D, allocation follows from (14) and (16). That is, 

tlD — 

2XD + Wdi + WD2 

(x + w)a + (x + w) B 

100 + 20 + 30 , ^ 
or 65 + 45 =L36 ' 

TABLE 2. Marginal Costs of Missions 1 , 2, and 3 for all Systems Before and After Allocation of 

Common Costs 


Before allocation 

After allocation 
















Mission 2 






Mission 3 





n D (p A +p B ) = 2xi, + w Di + WD2 or 1.36 (65 + 45) = 100 + 20 + 30, 

which implies 

(nDPA — wdi)I2xd + {n D pB — wtn)\2xo =1, or 0.69 + 0.31 = 1.00. 

That is, 69 percent of the common cost is borne by Mission 1 and 31 percent by Mission 2. The long run 
marginal costs, with the common costs allocated to the specific missions, are therefore: 

Ym = 0.69(50) + 20 = 55 and Y m = 0.31 (50) + 30 = 45. 

The same procedure is followed for System E, where 

2x E + w E 2 + WE* ^ 2(60) +30 +10 , , 9 
UE {x + w) B +(x + w)c 45 + 60 lbA 



{tiePb — wez)I2x e + (n E pc — weh)I2x e =1 or 0.32 + 0.68 = 1. 

The shares of common cost allocated to Missions 2 and 3, 32 and 68 percent, are then applied to the mar- 
ginal common costs, and the sums of the allocated marginal common cost plus the incremental cost for 
each system are 

Y E 2 = 0.32(60) + 30 = 49 and Y E3 = 0.68(60) + 10 = 51. 

Now that the costs have been allocated, let us consider the results. Taking System D first, we see 
that its cost for performing Mission 1 is less than for the single-mission system, System A. Its cost of 
performing Mission 2 is exactly the same as that of System B. Therefore, it would appear that System D 
is worthy of consideration for procurement and deployment for Missions 1 and 2, since its cost in each 
mission is, at worst, no greater than that of competing systems. 

Turning to System E, we find that it is more expensive in Mission 2 than are Systems B and D, i.e., 
its incremental cost exceeds its price. On the other hand, it is less expensive than C in its performance 
of Mission 3. Since its long-run marginal cost for Mission 2 exceeds that of B and D, System 3 will 
never be used in that mission. This implies that all common costs are to be allocated to missions which 
it does perform, Mission 3 in this case. However, this means that the long-run marginal cost of Mission 3 
is now 70 instead of 51, and it too now exceeds the cost of the single-mission alternative. Thus, it will 
never be used according to our simplified analysis and does not appear to be a good candidate. 

This result has a paradoxical element in that even though Zs's marginal costs exceed those of the 
single mission alternatives when considered separately, the sum of the marginal cost is lower than 
the sum of the single-mission alternatives. This, of course, is due to all costs being allocated to Mission 
3 since Mission 2 is not performed. Does the allocation of common costs thus lead us astray? Would 
we not be more likely to make good decisions on systems use and procurement if we simply compared 
total long-run marginal costs and ignore allocation? The answer in this particular example, at any rate 
(and probably generally, too) is no. Consider the possible ways of accomplishing Missions 1, 2, and 3. 
The combinations and their associated costs are presented in Table 3. The results of considering al- 

Table 3. Marginal Costs of Alternative Combinations of Systems Providing Three Missions 


Marginal costs of systems 
























ternative combinations is that the least-cost combination is C and D. In the case of a direct comparison 
with Systems B and C alone, however, System E does have an advantage. In this case, what is implied is 
that cost penalities be accepted in Mission 2 in order to retain low-cost capabilities in Mission 3. In this 


case, the marginal cost of Mission 2 would be set exactly equal to the price of the alternative system, 
reducing it from 49 to 45, and the marginal cost of Mission 3 would be raised from 51 to 55, i.e., four units 
of marginal common cost would be reallocated from Mission 2 to Mission 3. 

Common Cost Allocation, Time and Discount Rates 

Investments in military systems, like other public and private systems, have particular useful 
lives and are subject to particular rates of discount. Exactly how useful lives and discount rates are 
determined are difficult problems in their own right and will not be discussed here. Suffice it to say that 
useful lives are functions of wear and tear and obsolescence, and discount rates reflect the terms under 
which the values of present and future costs and effectiveness are compared. 

Several measures have been employed for evaluating benefits and costs over time. Although each 
has its strengths and weaknesses, the best general measure appears to be the net discounted present 
value of an investment.* This is the measure to be employed for the purpose of illustrating how dis- 
counting and system life may be handled for multiple-mission systems. First, let us consider a single- 
mission system. Its net discounted present value (V) is: 

(12) V=2,idi(pqo-C ), 

where di is the discount rate in year i, or (1 + r) f where r is the rate of interest in use for military 
systems and is assumed, for the sake of simplicity, to be constant for all relevant years, t The other 
variables are as described above. In words, then, V is the sum of a stream of net benefits, each year's 
entry being discounted by a greater amount than that of the year before.** 

If we extend this to the case of two multiple-mission systems where the investment costs are com- 
mon to both systems and have identical useful lives and are subject to the same interest rates, we have 

(18) V n = 2,idi (piqo + P2<?o — Cf(q ) + w x q + w>q„). 

The first order condition for the maximization of the net discounted present value of the two missions 
performed by the system is 

This implies, following Boiteux [2, appendix], 

(19) Xidi (p, + p 2 ) = Xidt {2x + wi + w 2 ). 
Since both sides may be divided by 1*idu we see that (19) reduces to 

Pi + Pi = 2x + w 1 + w-i. 

*For a comparison of the more prominent measures, see Baumol [1. ch. 19]. 

tThe interest rate is presumably based on some notion of social time preference or opportunity cost. For a discussion of 
some of the issues involved see (Prest and Turvey, [7, pp. 697-700]. 

**The link from "benefits" to expenditure (pq„) in our context is that pq a is the expenditure necessary to meet a particular 
requirement, and it is the meeting of the requirement that is the benefit. 

446 R- t. crow 

Thus, allocation in this case follows the same lines as (13), (16), etc.; and the interest rate and useful 

life plays no role. 

If these are different useful lives of the system in different missions, and/or the interest rate differs, 
a general solution of the allocation problem has not been found, although solutions have been found 
for specific cost functions, e.g., linear functions. The problem arises from the translation of units of 
output of particular missions to units of the multimission systems. The simplicity of the cost-allocation 
scheme proposed in this paper is due to Boiteux's demonstration that this can be done with great 
generality when time does not enter the picture. This breaks down, however, where the missions have 
different useful lives while performed by the multiple-mission system in question or have different 
interest rates, since it is not generally possible to translate them from specific missions to units of the 
multiple-mission system. It appears that the basic approach retains its validity, but not its simplicity. 


This paper has presented a new technique for allocating the common costs of multiple-mission 
systems. One major departure from existing practice is that the basis for allocation is to be found 
in the importance of the missions as reflected in their relative prices or, more generally, on an assess- 
ment of the relative abilities of a system to carry out alternative missions. The second major departure 
is that it uses marginal conditions rather than proportional or "either-or" allocations. Thus, unlike 
existing techniques, it is consistent with the principles of efficient resource allocation. 

These are not as drastic departures from current practice as they may seem, since some notion of 
relative importance is implicit in the distinction of major from minor missions in the currently employed 
two-ship method of allocation. The two-ship method also adheres to a rough approximation to marginal 
principles. In a very real sense, what has been presented above can be regarded as a generalization of 
the two-ship method, as well as an explicit statement of the principles underlying it. 

The major problem in employing the proposed method is, of course, to develop means of measuring 
the relative prices of different missions supplied by a system. Several tentative suggestions have been 
offered which, crude as they are, should aid in achieving better allocation of common costs. In all like- 
lihood better means can and will be devised through experience if the proposed approach is employed. 


The author is grateful for discussions and correspondence with A. S. Rhode, J. T. Kammerer, K. F. 
Linder, Saul Gass, George Taylor, Kenneth Babka and William Baumol. I wish to thank them but also 
absolve them of any blame for whatever errors or misconceptions remain. 


[1] Baumol, W. J., Economic Theory and Operations Analysis (Englewood Cliffs, N.J., Prentice-Hall, 
1965), (2nd ed.). 

[2] Boiteux, M., "Peak Load Pricing" (J. R. Nelson, ed), Marginal Cost Pricing in Practice (Prentice- 
Hall, Englewood Cliffs, N.J., 1964), chap. 4, pp. 59-90. 

[3] Carlson, S., A Study on the Pure Theory of Production (Kelley and Millman, New York, 1956). 

[4] Crow, R. T., "The Allocation of Common Costs of Multiple-Mission Systems," a report to Systems 
Analysis, Chief of Naval Operations, Contract No. N00014-70-C-0086 by MATHEMATICA, 
Inc., Bethesda, Md. (Nov. 1971). 


[5] Grey, J. C, Cost Analysis Methodology (Fire Support Study Working Paper No. 9), Dahlgran, Va.: 

U.S. Naval Weapons Laboratory (July 1970). 
[6] Littlechild, S. C, "Marginal Cost Pricing with Joint Costs," Economic Journal LXXX, 323-335 

(June 1970). 
[7] Prest, A. R. and R. Turvey, "Cost-Benefit Analysis: A Survey," Economic Journal LXXV, 683-735 

(Dec. 1965). 



A. Charnes 

Center for Cybernetic Studies 
University of Texas 

W. W. Cooper 

School of Urban and Public Affairs 
Carnegie-Mellon University 


A complete analysis and explicit solution is presented for the problem of linear frac- 
tional programming with interval programming constraints whose matrix is of full row rank. 
The analysis proceeds by simple transformation to canonical form, exploitation of the 
Farkas-Minkowski lemma and the duality relationships which emerge from the Charnes- 
Cooper linear programming equivalent for general linear fractional programming. The 
formulations as well as the proofs and the transformations provided by our general linear 
fractional programming theory are here employed to provide a substantial simplification for 
this class of cases. The augmentation developing the explicit solution is presented, for 
clarity, in an algorithmic format. 


The linear fractional programming problem arises in many contexts with relatively simple con- 
straint sets, e.g., in the reduction of integer programs to knapsack problems, in attrition games, and in 
Markovian replacement problems as Well as in Neyman-Pearson rejection region selection problems. 
Illustrative examples are provided by G. Bradley [5] or F. Glover and R. E. Woolsey [12] t, J. Isbell 
and W. Marlow [13], C. Derman [10], and M. Klein [16]. 

The linear fractional programming problem in all generality, and with all singular cases considered, 
was reduced in [8] to at most a pair of ordinary linear programming problems. This immediately made 
available all of the algorithms, interpretations, etc., that are associated with linear programming. 
This includes, we should note, access to any ordered field,** and any of the algorithms and the com- 
puter codes for linear programming problems which, by virtue of [8], thereby also become available 
for any problem in linear fractional form. Thus, with the development in [8], the work in linear frac- 
tional programming took a different form from its previous sole concern with the development of special 
types of algorithms for dealing with this kind of problem. 

This research was partly supported by a grant from the Farah Foundation and by ONR Contracts N00014-67-A-0126-0008 
and N00014-67-A-0126-0009 with the Center for Cybernetic Studies, The University of Texas. This report was also prepared 
as part of the activities of the Management Sciences Research Group at Carnegie-Mellon University under Contract N00014- 
67- A-03 14-0007 NR 047-048 with the U.S. Office of Naval Research. Reproduction in whole or in part is permitted for any pur- 
pose of the U.S. Government. 

t See also E. Balas and M. Padberg [1]. 

** See, e.g., the development of the opposite sign theorem and related developments in [7]. 




In the present paper, we apply our reduction, as given in [8], to a general class of linear frac- 
tional problems, viz., those for which the constraint set is given by 


a s= Ax =£ b 

so that this part of the model is in "interval programming" form.* Here we shall assume that the 
matrix A is full row rank and the vectors a, b, and x meet the usual conditions for conformance. This 
means that the constraint set is a parallelopiped. See the Final Appendix in [7]. 
Subject to conditions (1.1), we wish to 


N(x) c T x + Co 
maximize R(x) = ^ , . = -t= — ; — r ^ constant, 
D(x) d'x + do 

so that we are now concerned with a problem of linear fractional programming. Because A is of full row 
rank it has a right inverse, A*, and hence we can write 


Now, setting 





we obtain 

subject to 

AA* = I. 


x = A*y+Pz, 

P = I-A*A 

z is arbitrary, 

_ c T A*y + c T Pz + c 
max R (y ' z) = My + dTPz + dl 

a^y =£ b 

in place of (1.1) and (1.2). 

Because z is arbitrary, ** unless 


c T P=d T P = 


**Observe that we have ruled out the case in which R is identically constant in (1.2). 


we shall obtain max R = °°. In order to avoid repetitious arguments, however, we defer the proof of 
this until we have discussed the situation c T P = d T P = 0. See section IV, below. 

Waiving this consideration, we shall next proceed to solve this problem explicitly and in all gener- 
ality by means of the following three characterizations: The denominator D(x) is (1) bisignant, (2) 
unisignant and nonvanishing, or (3) unisignant and vanishing on the constraint set. In (1), i.e., the bisig- 
nant case, we shall show that R(x) = °°. Furthermore, we shall show how to identify this case at the 
outset so that it may be discarded from further consideration. This will leave us with only cases (2) 
and (3) to examine where we shall proceed to transformations from which a one-pass numerical com- 
parison of coefficients makes explicit the optimal value and solution. 

After this has all been done, we shall then return to assumption (1.6) in a way that utilizes the pre- 
ceding developments. Finally we shall supply numerical examples to illustrate some of these situations 
and then we shall draw some conclusions for further research which will return to the remarks at the 
opening of this section. 


Employing assumption (1.6), our problem is 

c TA*y + c 
(2.1) maxR{y)= dTA*y + d Q ' 

subject to 

in place of (1.5). Note, however, that here, and in the following, we shall slightly abuse notation by con- 
tinuing to use the symbols R, N, D, as in (1.1) and (1.2) even though we mean the transformed function, 
as in (2.1). 

V A 

Let D, D denote the maximum and minimum, respectively, of D over the constraint set. We note: 


(a) D is bisignant if and only if D> and D< 0. 

(b) D is unisignant if and only if either D =£ or D 2* 0. 

In terms of y, since we can choose each component of y independently (see (1.5)), we can express D 


and D immediately as 

D = X d'jbj + X <*'&+ do - max D(y) 
(2.2) a + 

D = 2 d'jaj + X d'jbj + do = min D(y) , 


where d) = (d T A)j is the jth element of d T A* and "+" or "— " indicates that the summation is on only 
the positive or negative dj. 

Let us first consider the bisignant situation. If we make the transformation of variables 

yj -aj= kj£j, dj 2* 

bj-yj = kjCj, dj < 0, 


where the kj > will be suitably chosen, the constraints transform to 

< kjCj ^ bj - aj 
(2.4) or 

0^^ = ^=^. 

Without loss of generality, the 8j are positive, since otherwise £j = and does not enter into the optimiza- 

By choosing the kj suitably, we obtain the form 

^yjCj + yo 

(2.5) * (c) = sc, + 2&-i- 


Note that 2^8j + 2j8j> 1, since otherwise D would not be bisignant. 


One of the following two cases must now hold: 
CASE (i): for some £, =£ £ s= g, such that 

D(l) = Owe have N(l) * 0, 
or else 

CASE (ii): for every £, «= £ « 8, such that 

D(0 = Owe have N(Q=0. 

In Case (i), since N(£) is continuous, there is a neighborhood of £ in which /V(£) is unisignant. 

(2.6) ^l J + ^l J =K^8 j +^8 } , 

+ + 

and *£ £j *£ 8j in the constraint set, we can choose €j 2 s 0, V €j > 0, so that =£ £ ± e ^ 8, sgn jV(£ 

+ e) = sgn N(£ — e) and /)(£ + e) > >/)(£ — e). By approaching £ along the line segment from one of 

£ + €,£ — €, we can make/? (£) - * °°. 

In Case (ii), we must have yj = for all j such that d j = 0. For D((,) =0 involves specifying only 
the Cj f° r dj > and dj < 0. If yj # for dj — 0, then having made #(£) =0, we can change the 

value of /V(£) by changing £ J() . Thus /?(£) =0 would not imply /V(£) =0. We therefore drop the "+", 
" — " notation is considering Case (ii) and rewrite it as V VjCi + To = whenever ^ £j =£ 8j and 


By letting y,= £jy , Jo 3* 0, this becomes ±j V 7/30 + yoyo) > 0, whenever 

5>j- yo = o 


(2.7) -yj + Sjyo^O 

yj 3*0 

Note that the implication extends to yo =0 since yo = implies all yj = 0. 

We now apply the Farkas-Minkowski Lemma* to the pair of implications in (2.7) and obtain 

(2 - 8) yo=-^+2 8 ^j + l/ o; 0$.">n^°. 

^ j 

for the first implication, viz., I ^ 7jyj+ yoyo I 2*0. For the second one viz., (^ Tj-yj+yoyo) 2=0, we 

j J 


f- %=/*"■ -flr+jy 


( — yo = — /a" + ^) 8,0/ + i>6"; 0j", fj~, i>6~ 2* 0. 

Adding the first expressions in (2.8) and (2.9), 

(2.10) or 

Adding the second pair 

= - (fi+ + fi~) + £ 8j(0+ + dy) {vS + v~ ) 

(2.11) or 

/ Lt+ + / L t -=2o J (0; + 0j) + *++»v 

Since each term on the right is nonnegative, we have 

(2.12) fjL+ + tJL-^0. 

Next, substituting from (2.10) into (2.11), we get 

o=- (fi++fi-) + £ 8,0*++/*-) + £ 8»(»?+»r)+ >tf +"o 

(2.13) or 

o=(^8 j -i) ( M + + M -) + 2v; + 2 Vj + ^ + iv 

Since the right-hand side is a sum of nonnegative terms, each of these must be zero. Moreover, 

/Lt+ + /x- = since^8 j -l>0 

*See Appendix C in [7]. 


(2.14) p+ = vj = since 8j>0 

By virtue of (2.14), and going back to (2.10), 

(2.15) 0+ + 0j = 0. 

Further, with 0t, $j 2* 0, we must have Of = 6t = for ally. Therefore, yj = /x + , for ally, and y — — fi+, 
so that we have 

^yjCj + yo ^ + (2^-i) 

(2.16) R(0 = J = 

= fx + = constant. 

In other words, Case (ii) can only occur in the trivial instance where the numerator is a constant multiple 
of the denominator. In this case, each coefficient in the numerator is the same multiple of the corre- 
sponding coefficient in the denominator, and this would have to be true in the original N(x), D(x) 
description and hence obvious upon comparing the initial coefficients. Since we have ruled out this 
very obvious case (see (1.2)), we have only max R(x) = °° when D(x) is bisignant on the constraint set. 



The unisignant cases now remain to be considered. If D =£ we multiply both W and D by —1, 
(thus not altering the value of R) and we are then reduced to "D" 5* 0. With this normalization, we 
make a transformation of variables as in (2.3), 

yj-aj = gj£j, dj^O 


bj-yj = gj^j, dj<0, 

where the gj > will be suitably chosen. The constraint set will now be 

bj — aj 

(3.2) 0<6*£8,= 


and, first considering the case where D > 0, the gj can be chosen so that the problem is 
(3.3) maxR(0= j '' , 0^&*£6> 


In (3.3) the summation is only over "+" and "— " because, the denominator being positive, optimal 
values for the £, such that dj — can be specified as £/ = when jj < 0; gj= 8, when jj 2* 0, and these 
new constant terms are assumed to be already contained in y . By the reduction that we gave in [8], 
however, the equivalent linear programming problem is 

max ^yjVj+yo'no, 

subject to 

£ TJJ+ 7)0=1 



where we can also note that these constraints imply rjo > 0. 
The dual to (3.4) is 

mm u 

subject to 


U + (t)j 3= Jj 



We shall employ this dual in an essential manner to obtain our desired one-pass argument for obtaining 
an optimum. At each step in the procedure, we shall have a solution to a less restrictive problem than 
the dual problem and an associated primal feasible solution. 

Suppose the y's are renumbered so that y\ 3* y% 3* . . . y n ; 
Then Case (i) yo 3= Ji has the immediately obvious primal solution tj* = 1, 17* = 0, and max /?(£) = yo. 

In the contrary case, Case ii, 

yi > • - . 2 s y P > yo 2* y P+1 3= . . . 3= y„, 

we build up an algorithm based on the dual problem in which we choose u q at the gth step to satisfy 

u« + o>J=yj, 7=1, • • • , q 

u«-%8.<o«=y . 

Using the first q equations to obtain w? in terms of « 9 and substituting in the last equation, we obtain 


(3.7) "*(i+is*)= IrA + yo 

and hence 

(3.8) u«= -> 

1 + lSj 


Thus, u q is a convex combination of yo, J\, . . -,y q with proportionahty constants 1, Si, . . ., 8 q . 
Note that if u Q *£ y 9 , then u q , w9, j= 1, . . ., q, satisfy the first q constraints plus the "yo" constraint 
of the dual problem; hence satisfy a less restrictive problem than the dual. 
If we take 

n. f =i / ( i+ i 8 ^) 


(3.10) v]=8j J (l+£sj), j=l, ■ • ., q 

i7?= 0, j>q, 

then t/9, y = 0, . . ., n is a feasible solution to the primal problem and y T r) Q + yor) q = u q (by substitution 
in (3.4) and comparison with (3.8)). 

Hence, whenever we can get u Q , o>? feasible for the dual problem, we will have a primal feasible 
solution for tj q with the same functional value and thus we will have an optimal pair of dual solutions. 
This, plus the equivalences maintained via (3.1) and our theory from [8], thus justifies the develop- 
ment that we detail as follows: 

To start, 

(3.11) u 1 =(y +y 1 8 l )l(l + 8 1 ). 

(Note, u 1 > yo since yi > yo and u' is a proper convex combination of yi, yo.) 

u 1 ?^? 

We check: 


If yes: 

we are 


coj = <u* |0, j > 1 


If No: 


^=1/1 + 8, 

17» = 7)* = 81/1 + 6 , 

i7] = T,* = o,y>i. 

"' < 72, 

yo + y,8, + y 2 8 2 

1 + 8, + 8 2 

/yo + y.8i \ / 1 + 8. \ / 8 2 \ 
V 1 + 8, Al + 8 1 + 8./^\l + 8, + 8 2 / r2 

,/ 1 + 8, \ / 8 2 \ 
= "ll + 8, + 8 2 j + ll + 8 1 + 8 2 j y2<y2 ' 

since u 1 < y 2 . 

Next, Is u 2 5* y 3 ? 

If yes: We are done with the substitutions indicated by (3.6). 
If No: 

u 2 < y 3 and we continue to « 3 . 

This process must stop by u p at the latest since 


> y ^y P +i 3= • • • ^y«- 

1 + 1)8; 

Thus max R(i) = u s where 5 is the least positive integer such that u s 3= y g+l , and 


yo+2 yj fi j 

i + S«i 

This concludes the case D > 0. 

ng case has D = 0. By making a transformation of variables as in (3.1), and choosing 
the gj > suitably, we obtain 


We may dispose of two situations immediately 
(i) y > 0: then /*(£)-»<» as & -> 0. 
(ii) To— 0: here the dual problems are 

max V jj-qj min a 


with V tjj = 1 with u 4- ojj 5= y,- 


77,-8,1)0 =s0 -^Sjtoj^O 


Tjj, rjo ^ ajj 2= 0. 

tion pair is 0).*= 0, tt= yi = maxy,, and 7j*=l, tj* = 0, _/ > 2. The maximum of /?(£) is 
thus yi. 

The remaining instance is 

(iii) y> < : y, s* • ■ • 5* y„ 3= y 2= y P +i 3* ■ ■ 3* y„. 

The dual linear programs can now be written 


max ^ y/nj + y i?o min u 


with V t/j = 1 with u + to, S 5 y, 



7), S= W, ^ 0. 

As before, we define 


This yields 

It may be easily verified that 


„,= *»-.) (.-L_! )+(-A. )y q , 

j 1 

or, what is the same thing, vfl is a proper convex combination of u ( « _1) and u". Thus, if u r <y r+u then 
u r < u r+1 < y r+ i as in our earlier argument for Z) > 0. 

The steps of our process are as before: if u q 5* y q +i we are done; otherwise, we test « 9+1 against 
y</+2. At worst we are done with 

■— (2^+u»)/i%. 

As before, u*= max R(£), where s is the least positive integer such that u 8 2* y»+i. 

IV. RCr, *) 

Returning to (1.5) we consider the remaining cases in which either 

(a) dTP = 0, c T P¥=0 
(4) or 

(b) <FP * 0. 

In case (a), since z is arbitrary we can make ± c T Pz — * °°, hence max R(y, z) — » °°. In case (b), we 
are in the bisignant denominator situation since we can make d T Pz -* ± 00. The argument of the 
bisignant section of this paper (with the additional variables, z) now shows that max R = °° since we 
have ruled out, a priori, the case in which R = constant. 


Some examples may help to fix and sharpen some of the preceding developments. Thus consider 

_. 3*i — *3 + 4 
max R(x) — 


-1 ^x l + x 3 ^2 

1 *£ x 2 *£ 5, 


and the variables Xu x 2 , x 3 are otherwise unrestricted. 

Here we have,* 


A = 

1 1 


A* = 



P = 




To exhibit the development in full detail we next write 

c r A*y= (3,0,-1) 



(£)-<»• •>(£)-* 

c =4 


c T Pz= (3, 0,-1) 


c T A*y= (0,2,0) 




z,\=(0,0,-3) lz x 

2 2 J I 22 

Ml \ Z 3 


= — 323 

d'Pz = (0,2,0) 


2, \= (0,0,0,) /z, 

22 I I 2 2 

Ml \*8 



Evidently in this case d T P — 0, as witness the next to the last expression. On the other hand, c T P ^ 0, 
as witness c T Pz — — 3z 3 in the second expression for (5.3). Hence condition (a) of the preceding section 
obtains and we have R —> °° even though 


- 1 s£ y, s£ 2 

1 =S y 2 =£ 5. 

This occurs because z is arbitrary and can be freely chosen in 


max R(y, z) 

c T A* y.+ c r Pz + Co _ 3y, - 3 Z3 + 4 

d T A#y+d T Pz + 


l !t may be observed that il" l*'!i«ii'<- is no! unique. 



which is the specialization of (1.5) to this case. Of course, the result R — > °° in (5.1) can be confirmed by 
direct inspection, since negative values of z 3 may be selected along with increasingly positive values 
of %\ as required in order to maintain the first interval programming constraint. 
This last remark suggests that an adjunction such as 


«£ x 3 *£ 1 

will convert (5.1) to a problem with a finite maximum. This yields an A for an interval programming 
format with the full row rank condition fulfilled as in 


A = 

1 o l- 




, A# = 





On the other hand, A is also of full column rank so that we also have A* =A~ 1 and 

P = I-A#A = 0. 

Hence both of the conditions specified in (1.6) are fulfilled, viz. 

c rp = d rp = 

for any c T and d T . 

The problem to be solved is now written 

max R(y) 

_ c T A* y+ Co _ yi — 4y 3 + 4 

(5.8) with 

d T A*y+d 


1 *£ y 2 s£ 5 

^ y 3 *£ 1. 


Evidently the solution to this problem is y* = 2, y* = 0, and y* = 1 so that 

max R (y) = — = 5. 

To obtain the corresponding components of x we simply utilize (1.4) with P = to obtain 


Xl \ 



/ 2 \ 

( 2 

x 2 

\ = A*y = 



x 3 J 






As may be seen, these x values satisfy (5.1) with (5.6) adjoined. They are evidently also maximal with 
R(x) = 5 since x 3 can no longer be negative and x 2 and x\ are at their lower and upper limits, 

In some cases the solutions, as above, may be obvious but, of course, this cannot always be ex- 
pected. Recourse to the preceding development, however, will produce the wanted results in any 
case, however, as we illustrate by now developing the above examples, along with the related back- 
ground materials, in some detail as follows: 

Because the denominator is unisignant we utilize section III. Observing that dj = d 2 = 2 in the 
denominator and hence is nonnegative, we have recourse only to the first part of (3.1) in order to write 

y\ — «i = gi£i 
6.1) y 2 — a 2 = g 2 t; 2 

J3 _ a 3 = ^3^3, 

where, respectively, ai = — 1, a 2 = 1, and a 3 = 0, via (5.8). The development from (3.1) to (3.2) applies 
to this case as, 


ft ft 

ft g2 

0^3^ S 3 -— — - ^ 

The insertion of (6.1) into the functional then produces 

(6.3) 3(ftgi + ai)-4(ft& + a 3 )+4 = 3g,g, - 4gfr + 1 

2fa6 + at) 2(ft6 + D 

via (5.8). Choosing 

(6.4) ft =1/2, «=1, ft =1/2 

and setting 

(6-5) y. = 3/2, y 2 = 0, y 3 =-4/2, y =l/2 


gives the denominator form wanted for (3.3) as: 

2?i 9 2 
max/?(£) = + t 


(6.6) s= £ 2 =£ 4 

Q =?£&*£ 2. 

The transformation £j = t7j/t}o from our previously developed theorem [8] then produces the following 

example for (3.4): 

3 Vo 

max - i?i + Ot7 2 — 2 173 + — 


171 + 172 + 173 + 170=1 

171 —6170^0 

172 — 4i7o s S 


171, i? 2 , 173 2*0, 

where the gj values of (6.4) combine with (6.2) to give 81 = 6, §2 = 4 and 83 = 2, as required for the 
application of (3.4). The corresponding dual, which our previous theory also gives access to, is 

min u 


11 + (Oi 2* - 


U + 0>2 2*0 
U + G>3 3 s - 2 

u — 6o>i — 4o>2 — 2co3 = — 

(ti\, Ct>2, CO3 2 s 0. 


This, of course, is the application of (3.5) to the present example. 

Since yi = 3/2 exceeds yo == 1/2 we are in the situation of case (ii) following (3.5). Thus, preserving 
our subscript identifications from (6.5), we have 

(6.9) y, >y 2* 72 2* 73 

in our present situation. We therefore see that the first application of the suggested algorithm should 


suffice. (See the remarks which conclude the case D > in section III.) 
Applying (3.11) now produces 

(6.10) u l = (1/2 + 3/2 -6)/(l + 6) = 19/14>yo=l/2. 

Evidently this also formally satisfies the condition that u l equals or exceeds the immediate successor 
of yi is (6.9). Hence, we have 

u l = u*= 19/14 

(6.11 ) u>\ = <o* = y i -u* = 3/2 - 19/14 = 2/14 

(li\ = (O* = 

(ti\ = &>* = , 

which satisfy the constraints of (6.8), as may be verified, with min u= u* — 19/14. 
Moving to the primal problem via (3.9), 


y\l = 

*?} = 

l l l 

1 + 8, 1 + 6 7 

6. 6 6 

1 + 8, ~ 1 + 6 ~ 7 

and all other tj] = 0. See (3.10). 

Inserting these values for the corresponding t/j in (6.7), we see that all constraints are satisfied with 

3/2 tj, + tjo/2 = 3/2 • 6/7 + 1/7 • 1/2 = 19/14, 

the same as the value of u*, thereby confirming optimality. In fact, as our theory [8] prescribes, we 
need merely apply the expressions 

7,0= 1/7, ^ = 7,,/tjo = 6/7 + 1/7 = 6 


with all other t7j = and then reverse the development from (6.6) to (6.7) in order to verify that this 
value is also optimal for 



vt t\ 3/2-6+1 /2 ..... 
max R (£) = = 19/14 

0*££> = 0^- — -=4 


0=s £, = ()<- = 2. 


Evidently we can now directly effect substitutions in (6.1) and obtain 

Yi = git;i + ai = 3 — 1 = 2 

(6.14) j&=&6 + a,=0+l = l 

y 3 = g 3 &+ 03=0+0=0. 

Then we can proceed exactly as in (5.9) to obtain the values *i = 2, Xz=l, x^ — O which we previously 
observed to be optimal. 


Although the development in this paper proceeded, for clarity, by algorithmic format, we sum- 
marize below the explicit solution in tabular format for direct theoretical interpretation and utilization. 

Summary of Solution 






(a) positive 

(b) nonnegative 




u' — , least 5 with u* g y„ + 1 


u* = , least with u" g >»+ 1 



Before proceeding any further we should probably point up, again, the crucial role played by the 
general theory (including the transformations and proof procedures) which we introduced in [8] for 
making explicit contacts between linear fractional and ordinary linear programming, in all generality 
and exact detail. These transformations are also utilized in the present paper and the theory is also 
extended via the duality (and other) characterizations given in the preceding text. These are joined 
together here for the proofs in algorithmic format kind we have just illustrated by example and com- 
mentary. Other uses can undoubtedly also be made of this theory and the preceding extensions via the 
passage (up and back) between linear fractional and ordinary linear programming that is now possible. 

Our general theory has been used by others, too, to extend or simplify parts of linear fractional 
programming en route to effecting the contacts with ordinary linear programming that are thereby 
obtained. The work of Zionts [22] should perhaps be singled out as being most immediately in line 
with the R = °° results presented in this paper. Zionts' development is directed only toward simplifying 
matters by focusing on eliminating cases for linear fractional programming which are either deemed 
to be unwanted or of little interest for practical applications. 

The developments cease as soon as the contacts with ordinary linear programming are identified 
via our theory, which he like others utilizes for this purpose. 

We have effected the developments in this paper in a way that makes contact with interval linear 
programming.* An opening for further two-way flows is thereby also provided. The resulting junctures 
should also help to guide subsequent developments in the more special situations that now seem 
to invite consideration in the future. Finally, the possibilities for dealing with specially structured 
problems (such as those observed at the start of the present paper) should also be observed explicitly in 
this conclusion, partly because the theory we have now developed and presented should also be a helpful 
guide to these additional cases which are important in their own right. Thus we can now conclude 
here by referring to our opening remarks. 


We wish to thank W. Szwarc of the University of Wisconsin for comments which helped us to 
improve the exposition in the manuscript for this article. 


[1] Balas, E. and M. Padberg, "Equivalent Knapsack-Type Formulations of Bounded Integer Pro- 
grams," Carnegie-Mellon University (Sept. 1970). 

[2] Ben-Israel, A. and A. Charnes, "An Explicit Solution of a Special Class of Linear Programming 
Problems," Operations Research 16, 1166-1175 (1968). 

[3] Ben-Israel, A., A. Charnes, and P. D. Robers, "On Generalized Inverses and Interval Linear 
Programming," Proceedings of The Symposium on Theory and Applications of Generalized 
Inverses, held at Texas Technological College, Lubbock, Tex. (Mar. 1968). 

[4] Ben-Israel, A. and P. D. Robers, "A Decomposition Method for Interval Linear Programming," 
Management Science Vol. 16, No. 5 (Jan. 1970). 

[5] Bradley, G., "Transformation of Integer Programs to Knapsack Problems," Yale University, 
Rept. No. 37 (1970). To appear in Discrete Mathematics. 

•See [2], [3], [4]. 



[6] Chadda, S. S., "A Decomposition Principle for Fractional Programming," Opsearch 4, 123-132 

[7] Charnes, A. and W. W. Cooper, Management Models and Industrial Applications of Linear 

Programming (John Wiley & Sons, Inc., New York, 1961). 
[8] Charnes, A. and W. W. Cooper, "Programming with Linear Fractional Functionals," Nav. Res. 

Log. Quart. 9, 181-186 (Sept.- Dec, 1962). 
[9] Dorn, W. S., "Linear Fractional Programming," IBM Research Report RC-830 (Nov. 27, 1962). 
[10] Derman, C, "On Sequential Decisions and Markov Chains," Management Science, 9, 16-24 (1962). 
[11] Gilmore, P. C. and R. E. Gomory, "A Linear Programming Approach to the Cutting Stock Problem," 

Operations Research 1 1, 863-888 (1963). 
[12] Glover, F. and R. E. Woolsey, "Aggregating Diophantine Equations," University of Colorado 

Report 70-4 (Oct. 1970). 
[13] Isbell, J. R. and W. H. Marlow, "Attrition Games," Nav. Res. Log. Quart. 3, 71-93 (1956). 
[14] Jagannathan, R., "On Some Properties of Programming in Parametric Form Pertaining to Frac- 
tional Programming," Management Science, 12, 609-615 (1966). 
[15] Joksch, H. C, "Programming with Fractional Linear Objective Function," Nav. Res. Log. Quart. 

[16] Klein, M., "Inspection-Maintenance-Replacement Schedules under Markovian Deterioration," 

Management Science 9, 25-32 (1962). 
[17] Marios, B., "Hyperbolic Programming," translated by A and V. Whinston, Nav. Res. Log. Quart. 

[18] Robers, P. D., "Interval Linear Programming," Ph.D. Thesis submitted to Northwestern Uni- 
versity (Evanston, 111.) Dept. of Industrial Engineering and Management Sciences (1968). 
[19] Robers, P. D. and A. Ben-Israel, "A Suboptimization Method for Interval Linear Programming," 

Systems Research Memo No. 206, Northwestern University (Evanston, 111.), The Technological 

Institute (June 1968). 
[20] Swarup, K. "Linear Fractional Functionals Programming," Operations Research 13, 1029-1036 

[21] Wagner, H. M. and J. S. C. Yuan, "Algorithmic Equivalence in Linear Fractional Programming," 

Management Science 14, 301-306 (Jan. 1968). 
[22] Zionts, S., "Programming with Linear Fractional Functionals," Nav. Res. Log. Quart. 75, 449-452 

(Sept. 1968). 


Linus Schrage 

Graduate School of Business 
University of Chicago 


When implicit enumeration algorithms are used for solving integer programs, a form 
of primal decomposition can be used to reduce the number of solutions which must be im- 
plicitly examined. If the problem has the proper structure, then under the proper decomposi- 
tion a different enumeration tree can be defined for which the number of solutions which 
must be implicitly examined increases with a power of the number of variables rather then 
exponentially. The proper structure for this kind of decomposition is that the southwest 
and northeast corners of the constraint matrix be zero or equivalently that the matrix be 
decomposable except for linking columns. Many real traveling salesmen, plant location. 
production scheduling, and covering problems have this structure. 


Consider an integer program for which the constraint matrix has the form shown in Figure 1. The 
important feature is that the columns of A% link two otherwise independent subproblems. 

In general, a set of columns is a Unking set if deleting that set partitions all remaining columns 



Figure 1. 
into two disjoint, nonempty sets, A \ andA 3 , such that there do not exist two columns, one in^i, and one 
in A 3 , both having a nonzero entry in the same row. 

For simplicity, assume that all variables are required to be either or 1, and that there are n 
variables in each of the blocks A U A 2 , and A 3 . There are then 2 3 " solutions to be implicitly examined. 


We assume that an enumerative scheme similar to that described in Geoffrion [4] is to be used. 
We can think of A 2 as being the master problem. If we fix the values of the variables in block A 2 to 
some set of values, then the variables in blocks A x and A 3 constitute independent subproblems and 
can be solved separately. Solving one of these problems (Ai or A 3 ) requires us to implicitly examine 
2" solutions. Each of these two problems must be solved for each possible setting of the variables in 
At. Therefore, we must effectively enumerate 2"(2" + 2") =2 2n+1 solutions. If we define a node to be 
the setting of some variable to one of its two values, then perhaps a better measure of computational 




difficulty is the number of nodes in the enumeration tree. The nondecomposition complete enumeration 
tree has approximately 2 3 " +1 nodes in it. By use of decomposition the complete enumeration tree has 
2n(2»+i + 2»+i)+2" +1 = 2 2n+2 + 2" +1 nodes. In this sense, decomposition reduces the difficulty of 
the problem by a factor of approximately 2" _1 - 


One can argue that this stairstep structure exists in several classes of integer programs. In a travel- 
ing salesman problem based on the United States, the variables in Az might correspond to the choice 
of arcs connecting cities in the Midwest. For each set of these arcs chosen, one should be able to com- 
plete the eastern and western legs of the tour independently. 

Similar arguments can be given for the plant location problem based on cities in the United States. 
The variables in A-i would correspond to the decisions of which plants to build in the midwest. For each 
set of midwest plant decisions one would expect to be able to solve the eastern and western plant loca- 
tion problems independently. 

The Wagner-Whitin [10] dynamic lotsize problem is a special case of the plant location problem 
which has an even more obvious decomposition structure. Once we decide to produce in a particular 
period, the production plans for subsequent and for previous periods can be solved independently. By 
discarding variables which obviously cannot be in the optimal solution, the 12-period problem given in 
Wagner and Whitin [10] can be decomposed into a stairstep structure with seven blocks. 

Another class of integer programs are covering problems. One situation giving rise to covering 
problems is in the assignment of vehicles to routes. Each city in the service area must be "covered" 
by at least one route. Arguments similar to those for the traveling salesman and plant location problems 
can then be made. 


Consider a constraint matrix with the stairstep structure shown in Figure 2. Again, for simplicity, 
we assume the problem is composed only of 0-1 variables. 

1 1 

A 2 

1 A 3 

A 4 


I A K 

Figure 2. 
This problem can be decomposed into a hierarchy of masters and subproblems. Assume there is 
a total oik blocks. Choose as the highest level master, block number [k/2]+ 1, where [x] is the greatest 


integer no larger than x. This divides the problem into two independent subproblems with [A/2 
[(k — l)/2] blocks each. These two subproblems can themselves be decomposed in similar 
If this decomposition is carried on in this recursive fashion we will then have approxinu 
levels of decomposition. In doing the enumeration we will first set the variables in block [A/! 
For each setting of these variables we must solve the independent subproblems composed o 
1 to [k/2] and blocks [A-/2] +2 to k. The first of these subproblems is solved by first setting the variables 
in block [[A/2]/2] + 1. This divides the problem further into independent subproblems. This enumera- 
tion method is applied recursively to each independent subproblem created. The result is a binary tre< 
of independent subproblems. 

For simplicity assume that each block contains n 0-1 variables. Let 5 (jf) be the number < 
nodes in a decomposition enumeration tree with j levels. If we increase the number of blocks such that 
the number of levels of decomposition increases iromj toy+ 1, then each of the 2" solutions to 
est level master partitions the remainder of the problem into two^ level problems. We then have that 
s(j + 1)=2" • 2 • s(j), where s(l) =2". Thus, s(j)= (2" +1 ) j/2 and the number of terminal nodes 

decomposition tree of a problem with k blocks is (1/2) (2" +1 ) log2fc = (1/2)A" +1 . A perhaps more accurate 
measure of the size of the enumeration tree is the total number of nodes, terminal and intermedial* 
in the tree. Let t(j) be the total number of nodes in the decomposition enumeration tree withy level 
of decomposition. The total number of nodes in a simple binary tree with 2" terminal nodes 
or approximately 2" +1 . We can now argue as before the claim that approximately t (j + 1 ) 
+ 2" + 1 = 2 n+1 (t(j) + 1). Now,t(l) is approximately 2" + 1 Thusf(» is approximately 

^ (2" +, )'=[2" +I -(2" +1 ) J ' +, ]/[l-2" +l ] 

which is approximately (2" +1 )-> for n andj large. Thus, the total number of nodes in the dec 

tree of a problem with k blocks is approximately (2" +1 ) l0S2 ' : — k n+i . We now see that the 

solutions and the total number of nodes which must be implicitly examined increases with a powei 

of the number of variables if the block size remains constant as the problem size is increased. Th 

number of solutions which would have to be implicitly examined under no decomposition 

the number of nodes in the tree under no decomposition is 2 A " +1 — 1. For example, if n= 10 and k= 3, 

then decomposition decreases the size of the tree by a factor of about 500. If A- =7 the factor is about 10 ! \ 

Edmonds [3] made the interesting suggestion that a problem be considered tractable if and only 
if one can exhibit an algorithm for its solution whose running time is bounded by a polynomial in the 
size of problem. There is no known algorithm for general integer programs which is polynomial hounded. 
In fact, Karp [5] gives "theorems which strongly suggest, but do not imply, that these problems, as well 
as many others, will remain intractable perpetually." It is interesting therefore that the class 
posable integer programs just described can be solved by a polynomial bounded algorithm, namely, 
simply searching the decomposition enumeration tree. 

The Wagner-Whitin [10] example problem, for example, is a 0-1 problem with 12 variables. The 
first variable is required to be 1 so there are 2 n = 2,048 feasible solutions to the problem and 4,095 
nodes in the simple enumeration tree. Using the seven block decomposition mentioned earlier 
equivalent of only 96 solutions need be examined implicitly. The number of nodes in this 
tion tree is 191. The computation involved in an enumerative algorithm should he appn 



proportional to the number of nodes examined. Complete enumeration of the 191-node decomposition 
tree in this case is not an unreasonable solution method. 


In the analysis thus far we have assumed that a linking block decomposed a problem into two 
independent subproblems. Much the same analysis could be done if a set of linking columns de- 
composed a problem into more than two independent subproblems. See, for example, Figure 3. 

A 2 



Figure 3. 

For each of the settings of the variables in A t which must be examined, the subproblems A 2 , 
A 3 ,A 4 , andA 5 can be solved independently. 


The more common form of decomposition structure discussed in linear programming literature 
involves a set of submatrices with no rows in common, except that there is a set of constraints at 
the top linking all the submatrices together. It may be that problems which are formulated with that 
structure actually have a structure like Figure 2. The rows in common between /fj and A-i, A3 and A4, 
etc. could be moved to the top and then one would have the more common form of decomposition 

The decomposition approach described may also be useful for problems where the natural formula- 
tion is with Unking constraints by realizing that a linking constraint can be replaced by a linking column 
and two nonlinking constraints. Consider the Unking constraint: 


Suppose that the problem is decomposable into two subproblems, each with n variables, except for 
this constraint. This Unking constraint can be replaced by one linking variable, y\, and two nonUnking 
constraints as follows: 




y+ X a i x v =b - 


Assume again that the Xj must be either or 1 and all the a,- are integer. In the worst possible case 
we must examine 2" different values for y, and the size of the tree under decomposition actually doubles. 


We would expect the number of different values of y to be examined to be much less than 2". In a 
covering problem, for example, the a/s are either or 1. Then, in the worst possible case we must ex- 
amine n different values of y. The number of terminal nodes in the decomposition tree is then n(2"+2") 
= /i2" +1 versus 2 2n terminal nodes in the nondecomposition tree. If n = 20 for example, then the number 
of terminal nodes is reduced by a factor of approximately 26,000. 

If one considers the usual decomposition structure studied in linear programming where there are 
p independent subproblems row-linked by a single master block with q rows, then the interested reader 
should be able to convince himself that the proper generalization of the transformation described 
in the previous problem is to reformulate the problem as a multilevel decomposition problem with 
p — 1 sets of linking columns and approximately log2 P levels of decomposition. Each linking set would 
consist of q columns. 


We have considered only pure 0-1 integer programs thus far. The analysis generalizes fairly natu- 
rally to the case where some of the variables may take on any integer value. The analysis also extends 
to the case where some of the variables are not required to be integral if fixing the integer variables 
in a block completely implies values for the continuous variables in that block. 


A disadvantage of this decomposition method is that the order in which variables are placed in 
the enumeration tree is partially specified beforehand. The importance of flexibility in the tree search, 
especially in specifying the order of addition of variables to the enumeration tree has been pointed 
out [2], [4], [6], [7], [9]. For example, if there is a variable which can take on the value of either or 1 
in the optimal solution (i.e., there are alternate optima), then the amount of searching required by a 
branch-and-bound or implicit enumeration algorithm is approximately doubled if this variable is placed 
first in the tree rather than last. If there are unimportant variables in the highest level master problem, 
the performance of an implicit enumeration aldorithm could be appreciably degraded by the decomposi- 
tion approach. 

Another apparent disadvantage is the additional bookkeeping which must be done to keep track 
of the decomposition tree. 

The first disadvantage can be alleviated somewhat by adapting the flexible tree-search procedure 
described in Tuan [9] and Bravo, et al. [2]. This approach allows one to partially reorder a tree without 
re-searching branches already searched or discarding branches yet to be searched. With respect to 
the second disadvantage, the bookkeeping scheme is not significantly more complex than conventional 

The additional restrictions on the branching scheme are that 

(a) we cannot branch on a variable i unless all variables in the master of the subproblem contain- 
ing i have been fixed; 

(b) we can backtrack any time the bound on the subproblem currently being solved is worse than 
the value of some feasible solution to the subproblem for the current setting of the variables in the 
master problem; or 

(c) we can backtrack any time that the overall bound is worse than the value of some known 
feasible solution to the entire problem. 



A computer program was written incorporating the decomposition method into a backtrack 
implicit enumeration scheme. The program was similar to those described by Ceoffrion |4], The revised 
simplex method of linear programming with explicit inverse was used to calculate bounds. After any 
variable was forced to or 1, dual pivots were performed to return to feasibility. After any variable 
in a backtrack step was released from or 1, primal pivots were performed to return to optimality. 
The rule for selecting the next variable to force to or 1 was simply to force to the nearer integer 
value that basic variable which was closest in value to an integer. 

The implementation was inefficient in at least two ways: 1) the LP portion worked with a full inverse 
at all times. That is, even though only a subproblem was being optimized, pivots were done in the 
full inverse. For the problems considered, each subproblem had half as many rows as the full problem, 
therefore each pivot in the implemented procedure may have taken as much as four times as much 
work as really necessary. 2) Natural integrality in the master problem was not taken advantage of. 
Before a subproblem can be searched, each variable in its master must be fixed to an integer value. 
If some variable Xj in the master was fixed to 1 and would have remained at the value 1, even if not 
constrained to the value 1, all through the enumeration of all subproblems, then it would not be neces- 
sary to examine Xj — in the backtrack step. Most integer programming routines take advantage of 
this natural integrality. The program here did not. 

A class of decomposable integer programs was derived based on a problem known as IBM— 1. A 
description of this problem can be found in Trauth and Woolsey [8]. The problem is a general integer 
program with seven variables and seven constraints. Two 0-1 variables were required to represent 
each of the original seven variables. A series of eight problems with the stairstep structure shown in 
Figure 2 was created. Each problem was composed of three blocks. Each of the two outer blocks con- 
sisted of a copy of the IBM-1 constraint matrix. In eight different problems, the middle linking block 
consisted of the first k columns of IBM-1, where k ranged from zero to seven. All problems had 14 






2 3 4 5 

Fk.ure 4. 



These problems were solved on an IBM 360/65. This machine uses multiprogramming; thus run 
times are random variables. The number of pivots is therefore perhaps a more reliable estimate of 
computational difficulty because most of the work is involved in pivoting. This statistic is plotted versus 
number of linking columns in Figure 4. As expected, the advantage of decomposition tends to diminish 
as the number of linking columns increases. The same program was used to solve the problems without 
decomposition; the program was simply not told that the problem was decomposable. Recall that under 
decomposition, each pivot should require less work than under no decomposition. 

Other computational statistics are displayed in Table 1. The link-edit time column is included 
only to give an indication of the variability in run times. The link-edit step required exactly the same 
amount of work in every run. 

Table 1. A Comparison of Decomposition with No Decomposition 

No. of 


Time (sec) 

Nodes examined 

Link-edit time (sec) 

























The run times under decomposition are perhaps longer than they need be in practice because 
a full solution report was printed each time a better integer solution was obtained to any subproblem. 
The runs without decomposition would typically produce three solution reports while a run under 
decomposition would typically produce, say, nine solution reports. A full solution report requires a 
fair amount of work because the inverse must be multiplied through the full matrix to calculate such 
things as reduced costs and dual prices. 

The decomposition method for these problems seems to be fairly robust in that the amount of 
work seems to increase less than linearly with number of Unking columns for this class of problems. 
For a small number of linking columns, decomposition is clearly superior. 


[1] Balas, E., "An Additive Algorithm for Solving Linear Programs with Zero-One Variables," Opera- 
tions Research 13, 517-546 (1965). 

[2] Bravo, A., J. G. Gomez, L. Lustosa, L. Schrage, and N. Pizzolato, "A Mixed Integer Programming 
Code," CMSBE Report No. 7043, University of Chicago (Sep. 1970). 

[3] Edmonds, J., "Paths, Trees, and Flowers," Canadian J. Math. 27, 449-467 ( 1965 ) . 

[4] Geoffrion, A. M., "An Improved Implicit Enumeration Approach for Integer Programming," 
Operations Research 7 7, 437-454 (May-June 1969). 

[5] Karp, R. M., "Reducibility Among Combinatorial Problems," presented at ORSA National 
Convention, New Orleans, La. (Apr. 26, 1972). 


[6] Salkin, H. M., "On the Merit of Generalized Origin and Restarts in Implicit Enumeration," Opera- 
tions Research 18, 549-555 (May-June 1970). 

[7] Spielberg, K., "Plant Location with Generalized Search Origin," Management Science 16, 165-178 

[8] Trauth, C. A. and R. E. Woolsey, "Integer Linear Programming: A Study in Computational 
Efficiency," Management Science 15, 481-493 (May 1969). 

[9] Tuan, Nghiem Ph., "A Flexible Tree-Search Method for Integer Programming Problems," Opera- 
tions Research 19, 115-119 (Jan.-Feb. 1971). 
[10] Wagner, H. and T. M. Whitin, "Dynamic Version of the Economic Lot Size Model," Management 
Science Vol. 5, No. 1 (Oct. 1958). 


S. A. Custafson 

The Royal Institute of Technology 

Stockholm, Sweden 


K. O. Kortanek 

Carnegie-Mellon University 

Pittsburgh, Pennsylvania 


Many optimization problems occur in both theory and practice when one has to optimize 
an objective function while an infinite number of constraints must be satisfied. The aim 
of this paper is to describe methods of handling such problems numerically in an effective 
manner. We also indicate a number of applications. 


In order to illustrate the subject of this paper, we immediately give a few examples on the class 
of problems we wish to study. 

EXAMPLE 1.1: One wants to determine a cumulative distribution function G which corresponds 
to a stochastic variable, which can assume values inside a finite closed interval [a, 6]. In N points 
tu *2, . . ., *-v, one has measured the values of G and obtained g\, g-z, . . .,g.w. We want to approximate 
G in [a, b] by a polynomial P of a degree less than a certain number n. It is natural to write 


and require that P(a) — 0, P(b) = 1, and P' (t) 2* 0. Since we cannot hope that P passes through the 
measured points, we try to solve the problem 

N I n \ 2 

inf W^ r <;-'- 6 , 

»1'»2 »nj=l X r=l / 

subject to 

2 y r a r - 1 = 


2 (r-l)y r r- 2 3s0 a^t^b 


J y r br-i=l. 

*This research was supported in part by National Science Foundation Grant GK 31-833. 



It is easily shown by examples that this problem may or may not be feasible, depending on the given 

This problem appeared when one wanted to study size distributions of grains in gravel deposits 
in order to get a suitable raw material for concrete production (Gustafson-Martna [27]). In the refer- 
enced paper, one did not attempt to solve exactly the problem indicated above, but used piecewise 
polynomial interpolation through the measured points instead, in order to meet the monotonicity 

EXAMPLE 1.2: Bojanic-DeVore [4] discuss the problem of one-sided approximation of a given 
function from below, while maximizing a linear functional. Their problem can be stated as follows: 
Let [a, 6] be a closed interval, «i, u%, . . . , u„ n given functions which form a Cebysev system (for a 
definition see, e.g., Karlin-Studden [32] or Gustafson [20]). Let further <b be continuous on [a, b\ 

n fb 

max 2. Vr I u r (t)dt, 
subject to 

^yruAt) ^<f>(t), te[a,b). 


Bojanic-DeVore give some unicity and existence results and identify the solutions with certain quad- 
rature rules. Methods for finding computational solutions to this problem are given in Gustafson [20]. 
EXAMPLE 1.3: Let again [a, b] be a closed interval, <a a positive function defined on [a, b] 
and 4> continuous on the same interval. We want to determine a polynomial of degree less than n which 
approximates <f> as well as possible in the weighted maximum norm determined by <u. That is, we 
want to solve the problem. 

Compute min tj subject to 

»(0I 2 y* r - l -4>(t)\*n U[a,b}. 

We can write this task in the equivalent form: 
Compute min tj subject to 

n a 

2) yr' r -'<«>(f)-i7s£<M*)o(0 and -J y r t r - l a>(t) -tj =S -<ft (t)<o(t). 

T=\ r=l 

This problem is well known and for a>U) = l, a solution is constructed computationally by means of 
Remez' algorithms (see, e.g., Cheney [9]). For numerical purposes, an approximative solution often 
is satisfactory (Powell [41]), Gustafson-Dahlquist [23]). 


EXAMPLE 1.4: Kantorovich-Rubinshtein [31] and Rubinshtein [43] give examples of production 
scheduling problems, where an infinite number of linear constraints must be met. Also Vershik- 
Temel't [48] propose a process of finding a sequence of approximate finite linear programming prob- 
lems whose optimal values converge to the optimal value of the infinite linear programming problem. 

EXAMPLE 1.5: Gorr-Kortanek [18], Gustafson-Kortanek [24] and Gustafson-Kortanek [25] give 
examples of models for study of air pollution problems, where an infinite number of linear constraints 
must be fulfilled over a two-dimensional set S. 

Additional Examples and Problems 

As illustrated above, moment problems stem from problems in approximation and minimization, 
see Shohat-Tamarkin [45], Rivlin-Shapiro [42], Shapiro [441, Karlin-Studden [32] and others. This 
leads to applications of infinite programming techniques to analysis, Duffin [13], [14], Kretschmer 
[36], [37], Duffin-Karlovitz [15], including the development of orthogonality theorems and similar 
results with application to the theory of integral equations. While applications of the moment problem 
to statistics and probability theory are well-known, recent problems in these areas have been brought 
into contact with the theory of moments by Krafft [35]. Interesting applications of generalized moment 
problems have also been made in theoretical physics, see Baker-Gammel [1]. See also the classification 
theory of Ben Israel-Charnes-Kortanek [3| for linear programming problems over closed convex sets 
in locally convex spaces and applications to approximation theory. 

DEFINITION 1.1: We denote by problem D the general task: 


»,,i/^..,i/„ G{y u y t , ■ ■ ■', y»h 

subject to 

(1.1) J y r u r {x) ss 4>(x) xeS. 


Here S is a given set, «i, u-i, . . ., u„, <p are given functions defined over S. The objective function 
G is also given and must be defined for all vectors yi, y-i, . . ., y n which satisfy (1.1). 

We will refer to problem D as a semi-infinite program. The fact that n is finite is crucial in our 
analysis. We observe that all the examples, 1.1 through 1.5, are instances of problem/). 

Many well-known optimization problems are subsumed by problem D. If S has a finite number of 
elements, we arrive at mathematical programming tasks of various kinds. Note in particular that if 
G also is linear, we get linear programs. For a discussion of these problems, the reader is referred to 
the textbooks by Charnes-Cooper [6] and Dantzig [10]. 

We want, instead, to discuss the case when S has an infinite number of elements. The general 
theory of semiinfinite programming embracing such problems is given in several papers by Charnes, 
Cooper, and Kortanek [7], [8]. Included in their theory is the development of regularization techniques, 
analogous to those of finite linear programming, which we use in our computational developments. 

In section 2 of this paper we present the parts of the general theory which are relevant for our 
purpose. We discuss the intimate connection between problem D and certain so-called moment prob- 


lems. We establish that the solutions of problem D can be found if one can solve a system of a finite 
number of scalar equations in a finite number of unknowns. This system is nonlinear even if G is 
linear and its numerical solution is a nontrivial task. In section 3 we propose an algorithm to be used in 
practical computational work. The basic idea underlying our algorithm is that D is approximated by a 
problem with a finite number of constraints. The solution hereby obtained is then used as an initial 
approximation which is then improved by Newton-Raphson iterations (other iterative methods might be 
considered). Thus the solution of semi-infinite programs can be achieved by combining well-known 
standard techniques. In particular problems, special short cuts can be used in order to facilitate the 
computations (see, e.g., Gustafson [20], [21]). We also discuss questions in connection with error 
estimation. We treat the problem of assessing how perturbations of input data influence the optimal 
solution and corresponding value of the objective function. 


DEFINITION 2.1: Let K be the set of vectors y- (ri, y 2 , . • ., y«) which satisfy (1.1). We refer 
to K as the constraint set of D. 

We note that K always is contained in R", the rc-dimensional vector space (independent of the 
nature of S). If K is nonempty, it is convex. We give three simple instances of D in order to illustrate 
different situations that can occur. 

EXAMPLE 2.1: Find 

inf yi + y 2 , 


y,x + y 2 * 2 ^l, xe[0, 1] 
EXAMPLE 2.2: Find 

inf yi, 

yi + y2*^Vx, xe[0, 1] 
(the inf.-value is 0, but it is not attained). 
EXAMPLE 2.3: Find 

inf yi + 1/2 y 2 , 



yi + y*x^— -, xe[0, 1] 

1 T X 

(The inf.-value is 3/4, and is assumed for y x = 1, y 2 = — 1/2). 

Another example is treated in lemma 3.5 in this paper. It is quite obvious that problems such as ex- 
amples 2.1 and 2.2 above will cause difficulties in actual machine computaton. We therefore want to 
specialize problem D, but in such a way that we still retain wide generality and consider only what 
we call regularized problems. (Compare Gustafson [20].) 

DEFINITION 2.2: We denote by problem D F a special case of problem D having the properties 
1,2,3 below: 

1. S can be written as Sk U Sf, where Sk is a compact subset of the A;-dimensional vector space 
(k < °°) and Sf has a finite number of elements. The conditions 

(2.1) 5>rM*)^<M*)> *cSf, 


must be such that they alone restrict y to a bounded region K F of R". We require, however, that (2.1) 
is consistent. 

2. We require further that U\, u 2 , . . ., u n and <j) are continuous over Sk and that U\, Uv, . . ., u„ 
meet Krein's condition: There exist constants C\ , c 2 , . . . , c„ such that 


2 c r u r (x) >0, xeSk- 


3. G must be differentiable and convex on K. 

We note that for a task of type Dp, K is a compact bounded set. Hence the minimum value Z is 
always assumed. 

Instead of Krein's condition, Gustafson [20] requires that «i, u 2 , . . ., u„ form a Cebysev sys- 
tem (is a unisolvent set). Unfortunately, this cannot be done if k> 1. We quote the following classical 
result from Buck [5] (the notations are slightly changed; C(ft) denotes the space of functions, con- 
tinuous over ft). 

LEMMA 2.1: If ft is a compact connected set and C(ft) contains a unisolvent linear subspace of 
finite dimension at least 2, then ft is homeomorphic to the unit interval or the unit circumference. 

Therefore, many results in this section will be generalizations of those in Gustafson-Kortanek-Rom 

In order not to get unnecessarily complicated formulae, we treat only the case that the inequalities 
corresponding to xcSf are of the type (2.2) below. We want first to treat the case 


G(y U Y2, ■ ■ ., y»)= ^ V-rJr, 

and then generalize the results to a general Df-problem. Denote the problem by D Fo . 


Z = min V yr/J-r- 

subject to 

V y r ii r {x) 2* $(.*), xeS k and 


Ft^yr^F* r=l, 2, . . ., n 

We want to show that the optimal solution of D hl) can be found by solving a nonlinear system of equa- 
tions. In order to derive this, we apply the theory of semi-infinite programming. For this purpose we 
need a few notations. 

Let 2 be the set of all finite regular measures which meet the integrability conditions 

\u,{x) | da(x) < °° r= 1, 2, . . . , n 

\4>{x)\da{x) <°o. 

Denote by M„ the convex cone in R": 

M„=l(T= (<ri, (72, • • ., o-„) |ov= I u r (x)da(x), r— 1, 2, . . ., n, cre 2 [• 

Compute inf £ YrH-r 


subject to V y r fi n (x) ^ <\){x) xeS k . 

Introduce now the problems P and D : 
Compute sup I 4>(x)da(x) 

a J S k 

subject to I Ur (x)da(x) = fJL r r=l, 2, . . .,n, 
J s k 

From Karlin-Studden [32, p. 472], we quote the result (the notations are slightly changed). 

LEMMA 2.2: Let /jl— (fJL U //, 2 , • • ., /a«) belong to the interior of M„. Then the optimal values of 
P» and D are equal. 

Arguing as in Gustafson [20], we can associate with D Fo the semi-infinite dual problem Pp : 

subject to 

max J <\){x)dot(x) + JT {F x v? — F 2 vf), 

JS k r =i 

I u r (x)da(x) + v} — v~ = fj. r r=l,2, . . .,« 
Js k 

ae^ v+ 5* 0, v~ 3* r= 1, 2, . . ., n. 


Pf„ is always feasible, but may be unbounded. Exactly as in Gustafson [20] and Charnes-Cooper- 
Kortanek [8], we establish: 

LEMMA 2.3: Let Dp have interior points. Then P F and Dp are consistent and bounded. They 
assume their optimal values which are equal. 
We also obtain, following [8] , [20] : 

LEMMA 2.4: Among the optimal solutions of fV , there are such which correspond to point- 
masses with finite number of mass-points. Furthermore, 

LEMMA 2.5: Let D fo have interior points and let an optimal solution of Pf be given by 
i) a pointmass distribution with mass mi at jc', i= 1, 2, . . . , q, 
ii) q + of the v$ are positive, namely v?., v%„ . . ., t> + 

iii) q~ of the Vr are positive, namely v 7, , Vg 2 , . . ., v~ . 

Let y=(yi,y2, • • -,yn) be an optimal solution of D Fv Then the following equations are satisfied: 

(2.3) J miUr(x') + V$ — Vr= V-r T= 1,2, ...,«. 

(2.4) % yriiAx*) = <t>(x>) »=1,2 q. 

(2.5) y rj = F l >=1,2 q\ 

(2.6) y* k = F 2 A=l,2, . . .,<r. 

REMARK: We have also: q + q+ + q~ < n and m< > 0, i= 1, 2, . . ., q. The q column-vectors 
tti,u 2 , . . ., u q in (2.3) given by Ui = (ui (**)> u 2 (xi), . . ., u„{x,)) are linearly independent. 

The relations (2.3), (2.4), (2.5), and (2.6) are necessary conditions for finding optimal solutions. 
They can be supplemented by further conditions. 

Let «i, u>, . . .,«,, and </> have continuous partial derivatives of the first order. Define Q by 

r= 1 

Then relation (2.4) takes the form 

Q(x i ) = <f>(x i ) i=l,2, . . .,<?. 
If x belongs to S* we have 


Let jc' be such that there is a nonzero vector h meeting the conditions: 

(2.7) x l + heS k , x'-hcS*. 


Define \\t by 

$(t) = Q(x i + th)-$(x i + th) -Kt*sl. 

\\t has a continuous derivative with respect to t on [— 1 , 1] , «//(<) 3* and «/»(0) = 0. Hence «/»' (0) = 0. 
This observation can be utilized to derive further constraints on Q in the following manner. De- 
termine for each point x l a system of linearly independent vectors h which meet the requirements 
(2.7). Denote these vectors by hi, h 2 , . . . , hi t (if there is none put /, = 0). We have always li =£ k, the 
dimensionality of S*. We note in passing that li< k at boundary points only. Denote the directional 
derivative along hj by Dj. Then we must conclude 

(2.8) Dj(Q(xi)-(f>(xi))=6 ;=l,2,...,/« if/js*l. 

i=l, 2, . . ., q 

Equations (2.3), (2.4), (2.5), (2.6), and (2.8) form a nonlinear system and the optimal solution of the 
original problem can be found by solving it. The unknowns are the q masses mi, m^, . . ., m q the n 
scalars yi, y2, • • ., y«, the vectors x l , x 2 , . . . , x q , and the numbers vf, v£, . . . , v+, t;,~, v^ . . -,v~. 
We mention now two special cases: In the problems treated by Gustafson [20] and Gustafson-Rom- 
Kortanek [26], we have k=l. Hence lj = at boundary points, lj— 1 at interior points. If S* is strictly 
convex (e.g., a nondegenerate ellipsoid), then lj— at boundary points and lj = k in the interior. 

We next extend our results to nonlinear functions G. From Kortanek-Evans [34, p. 889], we get 
(after appropriate changes of notations): 

LEMMA 2.6: Let G be a continuously differentiable function defined on an open convex set W 
in R n . Consider the two problems: 

(/) (/ ) 


min G(y) min y 7 '(VG)„=„ !(c 

when yeK when yeK 

where K is a closed convex set in W. Then y* is optimal for / if and only if y* is optimal for /* provided 

either one of the following conditions hold 

(a) G is pseudo-convex 

(b) G is quasi-convex and (VG) y=y * ^0. 

Using this lemma, we realize that if G meets conditions (a) and (b) of the lemma, we can replace D F 
by a linear problem with the objective function G* defined by 

C*(y,, y 2 , . . .,yu) = y. -T-*yr, 
where y* , y* , . . . , y * is an optimal solution of D F . In our nonlinear system (2.3), (2.4), (2.5), (2.6), 


(2.8), we should replace /x r by — * in (2.3) in order to allow for nonlinear objective functions. (The 

oy r 

remaining equations were derived independently of the objective function G.) 

DEFINITION 2.3: We denote by system NL the nonlinear system of equations obtained by COmbin- 
ing Equations (2.3) through (2.6) with (2.8). If G is nonlinear, fi r in (2.3) should be replaced by — * as 


described above. 

Compute min4yi + 2/3(y4+y«), 

when yi + xiy 2 + x 2 y3 + xfy 4 + Xix 2 y<i + x'iy 6 2s3+ [x\ — x 2 ) 2 (xi + x 2 ) 2 

S 2 ={(*,,* 2 ) |*i|«l, i=l,2J. 

The associated moment problem P? reads 

compute max [3— (x, — x 2 ) 2 (xi + x 2 ) 2 ]da(xi, x 2 ), 

a Js 2 

when I da(x\,x 2 ) = b 

s 2 

I Xida(x\, x 2 ) =0 


Jx 2 da(:ci, x 2 ) = 
s 2 

f *2<M*,,* 2 ) = 2/3 
Js 2 

Jaci« 2 rfa(aci, x 2 ) = 
s 2 

I x|da(*i,x 2 )=2/3. 

J5 2 

By inspection we find that Pf has the feasible solution with four mass points 

m X\ x 2 

1 V6 V6 

1 Vo" -Vo" 


1 -V6 Vo" 

1 1 l 

1 ~V6 "Vo" 

Djp has the feasible solution y } = 3, 72 — yz — ■ • • = y« = 0. The preference function assumes the value 
12 in both problems, that is, we have found an optimal solution. 

We observe that da(x t , x 2 ) = d{%\, x 2 ) is another feasible solution of Pp and hence we have found 
the quadrature rule with positive weights: 

JL* < *-* )rf( *'* ) -*(4^) + *(^^) + *(4^) + *(^^) i 

The rule has positive weights and is exact if 4> is a polynomial of two variables and of degree less 
than 3. 


3.1. Definition of Acceptable Approximations 

In this section we give the general principles of a computational scheme for the solution of the 
problem Dp (Definition 2.2). 

DEFINITION 3.1: Let y be any vector. We define the nonoptimality AZ (y) as \Z-G{y) |, 
where Z is the optimal value of Dp. 

DEFINITION 3.2: Let again y be any vector. We define the discrepancy of y by 

8(y)=mjn| JT y r u r (x) — <f>(x) \xeS k 

Thus y is an optimal solution vector if 8(y) 3* and G(y) = Z. Generally, one has to be content with 
trying to find a vector y such that 8(y) 5= — 8o and |C(y) — Z\ ^ €o, where the positive numbers 8o and c w 
are given tolerances. (Such a y is called an acceptable approximation.) We want to show that this can 
be done by means of a finite number of operations provided these are carried out with sufficiently good 

3.2 Cutting-plane Methods and Alternating Procedures 

Dp amounts to minimize a convex function over a compact convex set K. This is not specified in 
the form of a few simple equations. Instead, we know the supporting planes which are: 


V y r u r (x) — <b(x) 3 s xeS. 

r= 1 

One can, therefore contemplate using the principles of the cutting-plane method of Kelley [33] and 
Wolfe [51] with the accelerating device by Wolfe [51]. The first algorithm by Remez (see, e.g., Cheney 
[9], p. 96) can be considered as a variant of the cutting-plane method. 

We generalize this algorithm and define the following alternating procedure. (The word alternating 
refers to the fact that the optimal solution of Dp is computed by alternatively minimizing G over subsets 
of Kp containing K and minimizing certain functions over S. 


The general step is: let x l , x 2 , . . ., x sl (5 ^2) by given elements in S. Take y s as an optimal 
solution vector of the problem 

min G(y), 
subject to 

JT y r u,(* j )-<M* j ) ^0 ;=1,2,. . .,5-1. 
» = i 

Then define X s as an element in S which minimizes 

]? y s r u r (x)-(f){x) xeS. 


If this last minimum is nonnegative, the process is terminated. Otherwise we generate y* +1 , y* +2 , .... 

THEOREM 3.1: The sequence y s , y sfl , . . . generated by the alternating procedure above con- 
verges toward an optimal solution of Dr. 

PROOF: Since y s + 1 meets all the constraints of y s , G(y s + 1 ) 3= G(y s ). The same is true for any 
optimal vector of D F . Hence G(y*) =£ G(y s+1 ) *£ . . . =S d , where d is the optimal value. Three cases 
are conceivable, namely: 

CASE A: The alternating process stops after a finite number of iterations. 

CASEB: \im G(y s ) = d- -q, rj > 0. 

CASEC: Em G(y s )=d. 

If Case A occurs, the optimal vector has been reached because the last vector satisfies the constraints 
of K. We want to show that Case B is not possible. Since y J is an infinite sequence confined to the 
compact set Kf, it has accumulation points. Let y* be such a point. Put G(y*) = d — €, € > 0. Hence y* 
does not belong to K. 


Let x be an element in S which minimizes ^ y*u r (x) — <f>(x). Denote the corresponding minimum 


by A. We must have A <0, since y*4K. 

From the definition of y* we conclude 

(*) J yturW) -<M*0^0 7=1,2, .... 


Let now {y lj } be a subsequence such that y'J—>y* and x l} tend toward an accumulation point x*. We 
find for each j: 

X y l Ju r (x)-<t>{x) ^ 2 y' r Ju r (x'J)-<t>(x'J). 

r=\ r=l 

Letting j — * 00 we arrive at 

A = § y?u r (x)-<t>(x) * 2 v*u r {x*)-<f>{x*). 


But by (*) 

V y?u r (x*) -<f>(x*) ^ also, 


since y'J — *y*, x lj —*x*. This contradicts that A < and hence Case C is not possible. Theorem 3.1 is 
therefore proven. 

The alternating method might be effective when an approximate solution vector of D F is known. 

If the objective function of D F is linear, one can use the algorithms given in Gustafson [20] and 
Gustafson-Kortanek-Rom [26]. As a matter of fact, if the objective function is convex, D F can be solved 
by solving a sequence of semi-infinite programs. We show now that the solution vector of Dp can be 
constructed as an accumulation point of the sequence y°, y 1 , . . . constructed recursively as follows. 

Let y° belong to K. See def. 2.1. 

When y°, y 1 , . . ., y _1 (/=l, 2, . . .) are determined, we define the linear functions 

"j(y)=G(yJ)+£ (y-y/)(f£) . 

Then we define y* as the optimal solution of the problem 

min 7T/_i (y), 

subject to 

tt,_, (y)^ ttj (y) 7=0,1, . . .,1-2 


2 y r U r (x) 3=<M*) XCS K 



2) y r u r (x) ^<f>(x) xeS F . 


From KeUey [33] we conclude that {y'} contains a subsequence converging toward an optimal solution 
vector of D F . 

The methods discussed here can be of use to construct an approximate solution of the system 
NL, which is then solved by means of Newton-Raphson which is rapidly converging. 

3.3 Approximation with Problems with a Finite Number of Constraints 

3.3.1 Generalities 

We first introduce: 

DEFINITION 3.3: A finite subset T of S K , T= {*', * 2 , . . ., x N } is called a grid. 

DEFINITION 3.4: We denote by problem D F - T the task: 

Compute minG(y), 



Subject tO 2 yrUr(xi) 5S <M* J ) X J €r 



^ yrM*-') ^ <f>(x) xeS F - 


(Sf is the same set as in the definition of problem Dp.) This problem can be solved by standard tech- 
niques of mathematical programming. The solution of D F — T can be used to approximate that of D F . 
We shall now derive bounds for the discrepancy and nonoptimality expressed in grid data and char- 
acteristics of the functions U\, u 2 , . . ., u„, <f> and G. 

3.3.2 Error Bounds for Optimal Solutions of D r — T 
We need two definitions 
DEFINITION 3.5: Let a grid T and a norm be given on S fc . The number |r| given by 

IT^maxmin ||* — Xj\\ 

xtSi, zjtT 

will be called the coarseness of T. 

This definition of |r| agrees with the concept of "density" in Cheney [9, p. 84], but it is not con- 
sonant with the definition in Gustafson [20, p. 350]. The latter could not be directly extended to multi- 
dimensional grids. Since S* is finite-dimensional, all norms are topologically equivalent and therefore 
any norm can be used. 

DEFINITION 3.6: Continuity modulus of a real-valued function i|». If \\t is continuous on S* we 
define the function a>$ as follows: 

Gty(z)=SUp |l//(*')-l//(x")|, 

subject to 

x'eS K , x'eSk 


Gty is called the modulus of continuity of i//. aty is nonnegative and increasing. Since t// is continuous 
lim (t)^(z) =0 when z— » + 0. (Compare Cheney [9, p. 86].) 

We can now prove: 

LEMMA 3.1: Let y T be an optimal solution of D F — T. Then 

(3.1) 8(y T ) ^-A T (|r|), where 

(3.2) A,(|71) = X |y r T ku r (|7l) + a>*(|r|). 


PROOF: (The arguments are a slight generalization of those in Gustafson [20, pp. 351-352].) Put 



Let xeS k . We want to get a lower bound for <//(*)• By the definition of |T| there is a gridpoint *•>' such 
that ||« — *>'|| *£ \T\. We write 

i//(*) = <M* J ') + <//(*) - «M* j ), 


<//(*) ^4j(xJ)-\^(x)-iI,(xJ)\. 

Mx) &4i(x J )-v*(\\x-x i \\) ^-a>U\T\) 

since </>(* j ) ^ 0> II* - * j || ** \T\ and o\|, is positive and increasing. We find immediately 

<o*(\T\) ^Ardri). Q.E.D. 

This result can be strengthened to 

LEMMA 3.2: Let y T be an optimal solution of D F — T. Then 

(3.3) 8(y r ) ^-l±{\T\), where 

(3.4) A(|r|) = Gi*(|7|)+f ^oj Ur (\r\), a ndF=max{\F 1 \,\F 2 \}. 


PROOF: Use the fact that |y r | =£ F. 

The bound in Lemma 3.2 is more conservative than Lemma 3.1, but it is a priori in the sense that we 
do not need to know y T in order to evaluate it. Hence we can tell in advance how small |T| must be 
selected in order to get a discrepancy below a given tolerance. We next derive bounds for nonoptimality. 
LEMMA 3.3: Let y T be an optimal solution of D F — T. Let 


(3.5) H = 2 C r U r 


be positive* over Sk and put 

(3.6) y= min H{x). 

xtS k 


(3-7) y» = y?+y-'c r A 7 .(|r|), r=l,2, ..!,», 

where A 7 is defined by (3.2). Then 



where y is an optimal solution of D F . 

l(G(yT) + G(y"))-G(y) 

2 (G(y»)-G{y T )). 

*The existence of H is guaranteed by Krein's condition. 





We want to show that Q H {x) 2* <f>(x), xtSk. We write 


Q H (x)-^(x)=Q H (x)-Q T (x)+Q T (x)-^x), 

Q»ix)-4>{x)=y-*H(x)*T(\T\)+Q T (x)-<Hx). 

Q«(x)-4>{x)&A T {\T\)(y- l H{x) -l)z*0. 

Hence y" is a feasible solution of D t and therefore 


On the other hand, y T is the optimal solution of D r — T and we can therefore conclude 

G{y»)>G(y) ^G(y T ), 

from which (3.8) follows. 


Lemma 3.3 can be used to derive a posteriori bounds for nonoptimality. We can namely prove Lemma 

LEMMA 3.4: Let y r be an optimal solution of Dp — T. Then we can replace Ar in (3.7) by A in 
(3.4) and (3.8) gives an a priori bound. Further, if v is such that 


dy r 

r—1, 2, . . ., n 

everywhere, then 

7i(G(y T )+G{y»))-G(y) 

±M\T\)±\c r V r \. 

Often bounds of the partial derivatives of 14, . . ., u„ and <j> are known. Then the expressions for 
A and Ar can be simplified. Let x' and x' + h belong to S. Then the mean-value theorem gives 

|<M*' + M-<M*')|=£(f^r) w 

where £= x' + Bh for some number in (0, 1). Put 


K*=SUp SUP J) 

J-'V h 


dx r 



|<M*' + fc)-<M*')l«l|A||K*, giving 

oi*(/i)« ||A||k* 

Defining *c» r in the same manner as k>, we obtain 


A r (0^*A; (SO, 

A(f)^*A' (SO 


A' r =K,»+ £ |yr|K U) 

A' = k* + f£ K Ur ,andF = max{|F,|,|F 2 |}. 

If we replace A 7 and A with the bounds (3.10) and (3.11) and revise the arguments in the preceding 
four lemmas, we arrive at Theorem 3.2. 

THEOREM 3.2: If «i, u 2 , . . . , u n and <f> have bounded partial derivatives of the first order and 
y T is an optimal solution of Dp — T, one can give explicit a priori bounds on the nonoptimality and 
discrepancy of y T . These bounds are proportional to \T\. 

A further refinement is possible if Mi, u-i, . . ., u n and <f> have continuous partial derivatives of the 
second order. 

Let y r be an optimal solution of D F — T. Put 


* = 2 yrUr ~ $• 

Then «J/(* J ) 5* 0, for all x j eT. A lower bound of </»(*), xeSfc can be constructed from the following 

LEMMA 3.5: Let x\ x 2 , . . ., x k+1 be k + 1 given points in S* and h a number such that 

< |*?-xfl *£ h i=l,2,...,A; r=l,2, . . ., &+1 

and the determinant 

X 1 

1 xf^ 

r k+l 

r k+l 




Take a fixed point x in the convex hull U of x l , x 2 , . . ., x k+l . Put R(x) = sup/(*), when/varies over 
all functions with two continuous partial derivatives of the second order such that 



* 2ch xeU, 

where Cy i=l, 2, . . ., k; j—1, 2, . . ., k are given constants and /meets the condition 



/(*>) = ;-l,2 f . . .,A+1, 

i=i j=i 

where Xi, X 2 , . . ., A*+i are determined by 

*= £ A r * r . 


REMARK. To determine /?(*) over a fixed x is an instance of problem D in section 1 when S is 
infinite-dimensional. S is namely the space of all functions of k variables which have continuous partial 
derivatives of the second order. 

PROOF: Without loss of generality, we can assume that the coordinates are chosen such that 
x l = 0. Taylor's formula (expanding about x x ) gives, since /(*') = 0: 

k k 


f(x r ) = ^ajx } +^^bijXiXj, r=2,3, 
j=i i-ij-i 

1 \dXj)x=xi 
br.=l(J^L\ 0***1 

u 2 \dxidXj/ x=e r xi+u-e )x r r 


Hence we arrive at the problem: 


subject to 


k k 

L = sup ^ ajXj+ £ 2 btjXiXj, 
j=i <=ij=i 

k k 

2°*5+2 SVW 8 * r=2 ' 3 k+1 

j=l i=lj=l 

IM * Cy, |6 v |*Cy. 


(3.15) gives 


Since x is in the convex hull of {x r } r— 1, 2, . . ., A + 1, we can write 

*=£ X r * r where £ A r =l, \ r ^ 0, r= 1, 2, ...,&+ 1, 

r= l 



Thus we are left with the task 

subject to 

k k+l k k 

j=l r=2 i=l j=l 

k k I k+l \ 

L= 9 up 2 2 l*« x * x j~ 2 x »-^n- 

i = 1 j = 1 ^ r=\ ' 


Hence we should take 

bij = ctj sign f xixj — ]£ k r x[xjj 

Entering | bij | = \bL \ = Cy, we get the bound sought. The determinant condition implies that Ai, A2, 
Xfc+i are uniquely determined by x and x 1 , x 2 , . . ., ** +1 , and hence the lemma is proven. 
If we now make the substitutions x r =h£ r , x = h% in (3.14), we get 

R(X) ^ A 2 t t (dj ££ + Y krttfr \ 
i=\ j=l V r=\ I 




If now the bounds on the derivatives hold uniformly over S*, we get 

Hy T ) z*-e-\T\\ 

where e is determined by the distribution of the grid points T and the bounds of the second-order 
partial derivatives on S*. Hence, if we consider a sequence 7\, T 2 , . . .of hypercubic grids the bound 
on |8(y r, ')| decreases as the square of 1 7\ |, i = 1, 2, . . .. Revising the arguments leading to Theorem 
3.2 above, we find that the same holds true for the bound on the nonoptimality. 


3.3.3 Convergence Results when |T| — »0 

LEMMA 3.6: To every e > there is an h > such that if \T\ < h and y T is an optimal solution of 
Df—T, there is ay which is an optimal solution of Dp and satisfies || y — y T \\ < e. 

PROOF: The same arguments apply as in Gustafson [20, Theorem 3.3]. This result can be both 
generalized and sharpened. 

If Dp — T has an optimal solution y r , we can associate with it the problem Pp — T given below. 

DEFINITION 3.7: Let the grid T be {x\ x 2 , . . ., x N }. We denote by problem P F -T the task: 

N n 

Compute max ]? mj^ix 3 ) + ^ {Fit* — F 2 v~ ) , 

j = 1 r = 1 

N fdG\ 

subject to V mjU r (x j ) + vt — v~ = [- — 

ntj^ v? 5= v~ ^ 0. 

Pp — T is a linear program (even if G is not linear) and has hence an optimal solution which corresponds 
to a point mass distribution with, at most, n mass-points. Select such a solution of Pp — T with the mini- 
mum number of mass-points. Then with each T we can associate the vector z(T) given by 

(3.16) z{T)={i(T),tHT), . . .,t»(T),y(T), 

where £ J are the mass-carrying points. An optimal solution of Pp — T and Df — T is uniquely determined 
if z{T) is given, since the vectors ^(T), . . ., g v (T) are linearly independent and v+, v~ enter the 
solution it y r = Fi or F2, respectively. 

Let || I be a norm on Sk- We define \\ z\\ by 

11*11 = 2 llfll+i \yr\- 

THEOREM 3.3: Let 7} j=l, 2, ... be a sequence of grids such that | 7j | -> when y->oo. 
With each Tj we associate the vector z{Tj) defined analogously with z(T) above. Then we can find a 
subsequence z(Tj t ) converging towards a vector 2 which describes a solution of Pp—Dp. 

PROOF: Since v^ n,z(T) has at most n(k+ 1) components. S* is a compact set andFi ^ y r ^ F 2 . 
Hence we can find a number B such that || z(Tj) \\^B. Therefore {z(Tj)} is confined to a bounded 
subset of a finite-dimensional Banach space. Thus {z(Tj)} is contained in a compact set. We first select 
a subsequence Tj,, 7j 2 , Tj such that y{Tj ) converges towards a vector y. Using the same arguments 
as in Theorem 3.3 in Gustafson [20] , we establish that y is an optimal solution of Dp. 

To each y(Tj s ) there is an optimal solution of P—Tj s . We want to define vectors z(Tj s ) according 
to (3.16) which we do recursively as follows: 
Let z(Tj g ) be given and of the form 

z(T jg ) = V(Tj),{HTj), . . ., ?(T 3 J, y(Tj). 


Let Pf — Tj g+l have an optimal solution with the mass-carrying points u l , u 2 , . . ., u". We now define 
£ l (Tj - ) equal to a vector from the set {«', u 2 , . . ., u"} which minimizes 

ii* , ai,)-»-ii, 

when 1 ^ a =£ v. 

This is done for /= 1, 2, . . ., min (v, v). 

There are three cases 

(a) v < v. Then the vectors in z(Tj s ) which have not been matched are put in the vector for 

(b) v=v. No subsequent change in the definition of z(7j s+1 ) is done. 

(c) v > v. The vectors from {u 1 , u 2 , . . ., v v } which have not been matched are transferred to 
z(Tj s+l ). Hence, in all cases max (v, v) vectors are put in the vector for z(Tj t ) which also contains 

In the manner described above, we define recursively z(Tj t ), z(Tj 2 ), .... In no case will any of 
these vectors contain more than n points. 

We can now take a subsequence which converges towards an accumulation point which we call 
z* from which a solution of Dp — T and its corresponding point-masses can be constructed. 

REMARK: From the construction of z* it is obvious that certain of the points represented in the 
vector z* carry the mass zero. They can hence be removed. Other points can be confluent. If this is 
the case, only one member from every group of confluent points is carried. The last mentioned case 
is common. Compare Gustafson [20] and subsection 3.3.5. 

The idea to approximate a semi-infinite program with an optimization problem with a finite number 
of constraints is, of course, not new. Convergence results of the same character can be found, e.g., in 
Vershik-Temel't [48] and Cheney [9, pp. 86-88]. 

3.3.4 Special Devices to Economize and Stabilize the Simplex Method 
In this subsection we consider the case when G is linear, that is 


G(y) = 2 y r p. r . 

(As noted in section 3.2, every convex semi-infinite program can be solved by solving a sequence of 
linear semi-infinite programs.) 

In this case, Dp — T is a linear program and an optimal vector y T can be constructed with the 
Simplex method. 

Let now y be a candidate for y T and put 


(3.17) «/»(,= ^ y r u r -<b. 


We want to investigate if *}fy(xJ) 5* 0, x^T. Let x'eT be such that tyy(x') > 0. For all x in S* we find 
the bound 

<M*) > *„(*')-«*,,( 1 1*-*'| |). 


We note that 


<«%(*) ^ X ly»-|G)tt r («) + o>^ ( „), t > 0. 


If we put 

Si={*|ft^(||»-^||)^ *(*»)}, 

we can conclude that xeSi implies tyy{x) ^ 0. Thus in particular, if xh T is in Si the corresponding column 
need not be even generated in order to establish that <//(^-') 3= 0. 

Another problem is that numerical difficulties can be anticipated when l^l is small, due to the 
fact that the Simplex algorithm calls for the solution of a linear system, whose matrix of coefficients 
might be nearly singular. Each Simplex iteration consists of two stages: 

A. Determination of a candidate vector y by solution of the system 

(3.18) J y r ur{xi) = 4>(x*) j= 1, 2, . . ., n, 


where x J belongs to T and corresponding rows are linearly independent. Then we have to find the sign 
of il)y(x') for all x'eT, where i// tf is defined by (3.17). 

B. If min iM*'h x'eT, is assumed for i = ii, we introduce x" in the next basis and hence we have 
to solve systems of the form 

(3.19) J ^iUrix 1 ) = ix r r=l,2, . . .,n 

m t - > 0. 

As remarked in Gustafson [20], the abscissae often lie close together in pairs which will cause the 
matrix of coefficients in (3.18) and (3.19) to be ill-conditioned. 

Consider first the problem of determining </>«(*) from (3.18). Let y be the computed value of y 
and put 

€i = 

£?rM*>)-<M* J ) 

;=1, 2, . . ., n. 

Let further Aifj y (x) be the error in the value of ty y {x) caused by the fact that we use y instead of y. 
Gustafson [22] gives the bound 



^ Pj(x)u r (xJ) : =Ur(x) T=l,2, . . .,rt. 



P(*)=X |pr(*)|. 


We note then that p(x) is a continuous function over Sk and that p(xj) =1,7=1,2,. . . , n. Further, p 
is completely determined by the system u u u 2 , . . ., u„ and hence independent on the way we perform 
our computations. 

e r is, however, dependent both on the computer and the manner in which (3.18) is solved. Let £ r 
be the value we should obtain if we inserted the exact vector y in (3.18) and evaluated the residuals 
computationally. Wilkinson [50, p. 252] states that if (3.18) is solved by means of Gaussian elimination 
with pivoting, then ||e|| =£ 3||e|| even if the system is ill-conditioned. Therefore, if we solve (3.18) in 
this mode, |Ai//y(;t)| is as small as possible. However, the ordinary Simplex method does not provide 
pivoting when the sequence of linear systems is solved, a fact that may be the cause of the often- 
observed instability of linear programming codes. In contrast, the variant of Bartels-Golub-Saunders 
[52] holds promise to be more stable. 

When we solve (3.19), it is crucial that the sign of m, remains positive. 

3.3.5. Construction of an Initial Approximation for N ewton-Raphson Method 

We discuss now how to construct an initial approximation for system NL (Definition 2.3), when the 

solution of Di — T and Pf — T is known. In this subsection we make the following general assumptions: 
Al: u\, u 2 , . . ., u„ and (f> have continuous partial derivatives of the second order. 
A2: G is linear 

(3.20) G(y) = ^p. r y r . 
(If G has continuous partial derivatives, we put 

(3.21) C(y)~C(y*) + l ( yr _y*)(|£^ , 

where y* is a solution of D F — T.) 
A3: The matrix A y {x) given by 

««> W%g 

is positive definite when x is a zero of i// y , and \\f ti is defined by (3.17). 

REMARK: The linearization (3.21) is used when we employ the iterative process described in 
the end of subsection 3.2. Assumption 3 entails that the zeroes correspond to strict minima of t// y . A3 is 
difficult to verify in advance. 

The major problem in finding an approximate solution is to determine the number of mass-points 
in an optimal solution of D F when the solution of Dp — T and its primal Pf~T are known. Let y T be an 
optimal solution of D F — T and an optimal solution of Py — T be described by the pairs 

(3.23) (£', m'), i=l,2, . . ., n. 

*Research Report, Department of Computer Sciences, Standford, 1969. 


The vector £' gives the location of the mass m { . P F — T may have many optimal solutions, but we can 
always take one with n' ^ n. 

DEFINITION 3.8: A subset Cj of {£'}£:, is called a cluster if each member of Cj lies at most 
3 17*| from any other member of Cj, and Cj cannot be expanded by inclusion of more elements of {£'}"' ,. 

Thus we divide the set (3.23) uniquely in q clusters where 1 *£ q^n'. 



*T = 5) yr U r~<f> 
r= 1 

and define the matrix At analogously with A u in (3.22). 

Ci is called a point-group if A T (x j ) is positive definite and 

||V«M*J)|| *s 0.5||/M*;)||, all xUC\ 

Two cases are possible, namely: case a: all clusters are point-groups; case b: there is a cluster that is 
not a point-group. 

LEMMA 3.7: Let A\,A2, A3 hold. Then there is a number h' such that if \T\ < h' then all clusters 
are point-groups. 

PROOF: Assume the contrary. Using Theorem 3.3, we can then select a sequence 7*i, T-z, ... of 
grids such that \Ti\ — * and such that corresponding vectors z(Ti) tend to a vector z while at least 
one of the clusters of z(Ti) is not a point-group. Let the clusters be Ci(l), C 2 (l), . . ., C q (i)(l). Each 
of the clusters contains at most n elements and hence their diameter is less than 3n | Ti \. Hence all the 
mass-carrying points in a cluster converge toward the same point. Denote these limit-points by £,-, 
j= 1, 2, . . ., q. Using Assumption A3, we conclude that there is a 8 > such that if |U~ £j|| < 8», 
y'=l, 2, . . ., q then \\A(x) \\ 5* 4|| Vi/»(x) ||- The convergence of y\, yg, . . ., implies that there is an 
Ni such that ||^r,(A:)|| 2 s 2|| Vi/»7-,||, l> Ni for \\x— ![\\ < 8o. In the same manner we establish that there 
is an Ni such that l> N 2 implies A T ,(x) is positive definite. Therefore, if / > max (N, N t , N 2 ) then 

Cj is a point-group, j— 1,2,. . ., q. Hence the sought contradiction is established. 

If T is such that the set (3.23) can be subdivided in clusters, all of which are point-groups, we 
construct an initial approximation to system NL as follows: 

1. The masses in each point-group are combined and allocated in the group's center of gravity. 
Hence we take q equal to the number of point-groups and each point-group corresponds to a mass-point. 

2. If a mass-point is less than 3 J T"| from the boundary, it is moved to the nearest boundary-point. 

3. This so obtained point-mass distribution is taken as first approximation. 

4. Equations (2.4) and (2.8) are determined by the distribution of mass-points. 

3.3.6 Use of Newton-Rophson Methods 

Applying the methods described in the preceding section, we obtain an approximate solution of 
the system NL. 

A solution of this system can be described by a vector z of the general structure 

(3.24) z=(nii, x 1 , m 2 , x 2 , . . ., m q , x Q , v + , v~, y), 

where the point x* carries the mass nii and y is an optimal solution of D F . 


DEFINITION 3.10: Any vector of the general structure (3.24) will be called a trial vector if 
/»j>0, i=l,2, . . ., q,vf& 0, r=l, 2, . . ., n and vf^ 0, r=l, 2, . . ., rc. 

LEMMA 3.8: Any solution of (2.3)-(2.6), (2.8) which is a trial vector, can be used to give a lower 
bound of the optimal value of Dp. If also 


2 y r u r (x) 3* <j>(x), *«£*, 


then y is an optimal solution of Dp. 

PROOF: A trial vector which is a solution describes a feasible solution of Pf — T for a certain T. The 
conclusions follow, then, from known duality relations. Q.E.D. 

We want to generate a sequence of trial vectors which converge toward an optimal solution. We 
write the system NL under the general form 

W(z) = 0. 

Assume a trial vector z-* is known. Then we want to construct a correction hi such that 

(3.25) \\w(zi+h*)\\< lirG*)||, 

and then put 

(3.26) zi +1 = Z i + hi, 

if hi can be selected such that z j -+- hi is a trial vector. 

In the classical Newton-Raphson method, we take hi as the solution b j of the system. 

N ZW( 7 i\ 

(3.27) y^-^fri+r(zJ) = 

where N is the number of components of z. If the matrix in (3.27) is singular, the methods in Ben- 
Israel [2] can be used. 

In Ortega-Rheinboldt [40, p. 421], we find general criteria for the convergence of the Newton- 
Raphson's method, but they cannot, in general, be used. However, if the matrix in (3.27) is regular for 
all z } and the condition (3.25) is met, then {zj} converges toward a local minimum of the function 
||JT(z)||. The same is true for the modified sequence {zi} obtained by putting hi = \-bi where the real 
number \ is such that condition (3.25) is met together with the requirement that z j+1 is a trial vector. 

The general idea is to generate a sequence of trial vectors until the norm of the residual falls below 
a value, prescribed in advance. If this does not happen before a given maximum time has elapsed, the 
process is assumed to diverge. Then one can use the last trial vector found to construct better approxi- 
mations by means of the grid point methods described in earlier sections. 

We note that Dp subsumes a large class of different problems. In many important particular 
instances, special methods can be used both to simpUfy the computational scheme and to establish 
properties of convergence and unicity. We will return to this in later papers. 


3.3.7 Remarks on Sensivity 

It is well known from the numerical solution of special cases of Df (see, e.g., Example 1.3 in the 
introduction) that small changes in input data cause large dislocations of x' and mi, but thai the optimal 
value is not affected very much. x x are the locations of minima of the function 

£ y r u r - <f>, 


and to determine these is an ill-conditioned task in one dimension (see, e.g., Wilkinson [49, p. 39]) 
and this situation cannot be expected to improve when St has several dimensions. 

A first order a posteriori approximation of the sensivity can be made if the matrix of (3.27) is regular. 
Let/(z) — G(z) and let z be an approximation of a solution z*. Linearizing, we arrive at 

df=f(z*)-f(J) ~<V/, b>, 


Mb = - W{1) 


„„ dWi(z) , , * . 

M Xi = and < a, b > denotes > a r b r . 

Hence we find ||6||= \\M~W{z)\\ ^ \\M^\\ \\W{z) || and we get the approximation \\z~z*\\ ~ ||6||. The 
estimate of df can be written in an attractive manner. Using Lemma 2.1 in Gustaison [22J, we get 
e(/= — < u, w > where M T u=S7f. Hence an approximate bound on \df\ is given by | < u, w>\. 
This concludes our treatment of the computational scheme. 

Recently (1973) a definition of cluster is taken independently of grid size. The convergence 
theorems are then different than lemma 3.7. 


[1] Baker, George A., Jr. and John L. Gammel, "Applications of the Principle of the Minimum Maxi- 
mum Modulus to Generalized Moment Problems and Some Remarks on Quantum Field Theory," 
J. Math. Anal, and Appl. 33, 197-211 (1971). 

[2] Ben-Israel, A., "A Newton-Raphson Method for the Solution of Systems of Equations," J. Math. 
Anal, and Appl. 15, 243-252 (1966). 

[3] Ben-Israel, A., A. Charnes, and K. O. Kortanek, "Asymptotic Duality Over Closed Convex Sets," 
J. Math. Anal, and Appl. 35 (1971). 

[4] Bojanic, R. and R. DeVore, "On Polynomials of Best One-Sided Approximation," L'Ensignement 
Math. 12, 139-164 (1966). 

[5] Buck, R. C, "Alternation Theorems for Functions of Several Variables," J. Approx. Theory 1, 
325-334 (1968). 

[6] Charnes, A. and W. W. Cooper, Management models and industrial applications of linear program- 
ming (J. Wiley and Sons, New York: 1961) Vols. I and II. 


[7] Charnes, A., W. W. Cooper, and K. O. Kortanek, "Duality, Haar Programs and Finite Sequence 

Spaces," Proc. Nat. Acad. Sci. U.S., 48, 783-786(1962). 
[8] Charnes, A., W. W. Cooper, and K. O. Kortanek, "On the Theory of Semi-Infinite Programming 

and a Generalization of the Kuhn-Tucker Saddle Point Theorem for Arbitrary Convex Functions," 

NRLQ 16, 41-51 (1969). 
[9] Cheney, W. E., Introduction to approximation theory (McGraw-Hill, Inc., N.Y., 1966). 
[10] Dantzig, G. B., Linear programming and extensions (Princeton University Press, N J., 1963). 
[11] DeVore, R., "One Sided Approximations of Functions," J. Approx. Theory I, 11-25 (1968). 
[12] Duffin, K. J., "Infinite Programs," in Linear Inequalities and Related Systems (ed. by H. W. Kuhn 

and A. W. Tucker), Annals of Math. Studies No. 38, Princeton University Press, Princeton, 

N.J., pp. 157-170(1956). 
[13] Duffin, R. J., "An Orthogonality Theorem of Dines Related to Moment Problems and Linear 

programming," J. Combinatorial Theory 2, 1-26 (1967). 
[14] Duffin, R. J., "Duality Inequalities of Mathematics and Science," 401-423 in {39|. 
[15] Duffin, R. J. and L. A. Karlovitz, "Formulation of Linear Programs in Analysis I: Approximation 

Theory," SIAM Jour. 16, 662-675 (1968). 
[16J Fan, Ky, "On Systems of Linear Inequalities," in Linear Inequalities and Related Systems (ed. 

by H. W. Kuhn and A. W. Tucker), Annals of Math. Studies No. 38, Princeton University Press, 

Princeton, N.J. (1956), pp. 99-156. 
[17] Fan, Ky, "Asymptotic Cones and Duality of Linear Relations," J. Approx. Theory 2, 152-159 

[18] Gorr, W., and K. O. Kortanek, "Numerical Aspects of Pollution Abatement Problems: Constrained 

Generalized Moment Techniques," IPP Report No. 12, School of Urban and Public Affairs, 

Carnegie-Mellon University (Oct. 1970). 
[19] Gorr, W., S.-A. Gustafson, and K. O. Kortanek, "Optimal Control Strategies for Air Quality 

Standards and Regulatory Policy," Environment and Planning 4, 183-192, (1972). 
[20] Gustafson, S.-A., "On the Computational Solution of a Class of Generalized Moment Problems," 

SIAM J. Numer. Analysis 7, 343-357 (1970). 
[21] Gustafson, S.-A., "Numerical Aspects of the Moment Problem," Fil.dr. Thesis, Institutionen 

for Informations Behandling, Stockholms Universitet, Stockholm, Sweden (Apr. 1970). 
[22] Gustafson, S.-A., "Control and Estimation of Computational Errors in the Evaluation on Interpola- 
tion Formulae and Quadrature Rules," Math. Computation 24, 847-854 (1970). 
[23] Gustafson, S.-A. and Germund Dahlquist, "On the Computation of Slowly Convergent Fourier 

Integrals," Methoden und Verfahren der Mathematischen Physik 6, 37-43 (1972). 
[24] Gustafson, S.-A. and K. O. Kortanek, "Analytical Properties of Some Multiple-Source Urban 

Diffusion Models," Environment and Planning, 4, 31-41, (1972). 
[25] Gustafson, S.-A. and K. O. Kortanek, "Mathematical Models for Air Pollution Control: Numerical 

Determination of optimizing Abatement Policies" to appear in Models for Environmental 

Pollution Control (R. A. Deininger, Ed.), Ann Arbor Science Press, Ann Arbor, Mich. 
[26] Gustafson, S.-A., K. O. Kortanek, and W. Rom, "Non-Cebysevian Moment Problems," SIAM J. 

Numer. Analysis 7,335-342 (1970). 
[27] Gustafson, S.-A. and J. Martna, "Numerical Treatment of Size Frequency Distributions with 

Computer Machine," Geologiska Foreningens Forhandlingar 84, 372-389 (1962). 
[28] Gustafson, S.-A. and W. Rom, "Applications of Semi-Infinite Programming to the Computa- 


tional Solution of Approximation Problems," Tech. Report No. 88, Dept. of Operations Re- 
search, Cornell University, Ithaca, N.Y. (Sept. 1969). 
[29J Haar, A., "Uber lineare ungleiehungen," Acta. Math. (Szeged) 2, 1-14 (1924). 
[30] John, Fritz, "Extremum Problems with Inequalities as Side Conditions, in: Studies and essays, 

Courant Anniversary Vol. (ed. K. O. Friedrichs, O. E. Neugebauer, and J. J. Stoker) J. Wiley 

and Sons, Inc., New York, pp. 187-204 (1948). 
[31] Kantorovich, L. V. and G. Sh. Rubinshtein, "Concerning a Functional Space and Some Extremum 

Problems," Dokl. Akad. Nauk. SSSR 115, 1058-1061 (1957). 
[32] Karlin, S. and W. J. Studden. Tchebycheff Systems: with Applications in Analysis and Statistics 

Interscience Publishers, J. Wiley and Sons, Inc., New York, (1966). 
[33] Kelley, J. E., Jr., "The Cutting Plane Method for Solving Convex Programs," J. SIAM 8, 703-712 

[34] Kortanek, K. O. and J. P. Evans, "Pseudo-Concave Programming and Lagrange Regularity," 

Operations Research 75, 882-891 (1967). 
[35] Krafft, Olaf, "Programming Methods in Statistics and Probability Theory," 425-446 in [39]. 
[36] Kretschmer, K. S., "Programmes in Paired Spaces," Can. J. Math. 13, 221-238 (1961). 
[37] Kretschmer, K. S., "Linear Programming in Locally Convex Spaces and Its Use in Analysis," 

Ph.D. Thesis, Carnegie-Mellon University, Pittsburgh, Pa. (1958). 
[38] Meinardus, Giinter, Approximations of functions: theory and numerical methods (Springer- 

Verlag, New York, Inc., 1967). 
[39] Nonlinear programming (ed. J. B. Rosen, O. L. Mangasarian, and K. Ritter) (Academic Press, 

New York, 1970). 
[40] Ortega, J. M. and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables 

(Academic Press, New York and London, 1970). 
[41] Powell, M. J. D., "On the Maximum Errors of Polynomial Approximation Defined by Interpola- 
tion and by Least Square Criteria," Comp J. 9, 404-407 (1966). 
[42] Rivlin, T. J. and H. S. Shapiro, "A unified Approach to Certain Problems of Approximation and 

Minimization," SIAM J. Appl. Math. 9, 670-699 (1961). 
[43] Rubinshtein, G. Sh., "Investigations on Dual Extremal Problems," Doctoral Dissertation, Inst. 

Matem. SO AN SSSR, Novosibirsk (1965). 
[44] Shapiro, H. S., "On a Class of Extremal Problems for Polynomials in the Unit Circle," Portugaliae 

Math. 20, 67-93 (1961). 
[45] Shohat, J. A. and J. D. Tamarkin, "The Problem of Moments," Mathematical Surveys, No. 1, 

Am. Math. Soc, New York (1943). 
[46] Stiefel, E., "Note on Jordan Elimination, Linear Programming and Tchebycheff Approximation," 

Numer. Math. 2, 1-17 (1960). 
[47] Todd, J., A survey of numerical analysis (McGraw-Hill, New York, 1962). 
[48] Vershik, A. M. and V. TemePt, "Some Questions Concerning the Approximation of the Optimal 

Value of Infinite-Dimensional Problems in Linear Programming," Sibirskii Matematicheskii 

Zhurnal 9, 591-601 (1968). 
[49] Wilkinson, J. H., Rounding Errors in Algebraic Processes (Prentice-Hall, Inc., Englewood Cliffs, 

N.J., 1963). 
[50] Wilkinson, J. H., The Algebraic Eigenvalue Problem (Clarendon Press, Oxford, 1965). 


[51] Wolfe, Philip, "Accelerating the Cutting Plane Method for Nonlinear Programming," J. Soc. 
Indust. Appl. Math. 9, 481-488 (1961). 

[52] Bartels, R. H., G. H. Golub, and M. A. Saunders "Numerical Techniques in Mathematical Pro- 
gramming" in Nonlinear Programming (ed. by J. B. Rosen, O. L. Mangasavian and K. Ritter), 
Academic Press, New York, pp. 123-176 (1970). 


W. L. Wilkinson 
The George Washington University 


This paper presents an algorithm for determining the upper and lower bounds for arc 
flows in a maximal dynamic flow solution. The procedure is basically an extended applica- 
tion of the Ford-Fulkerson dynamic flow algorithm which also solves the minimal cost flow 
problem. A simple example is included. The presence of bounded optimal arc flows entertains 
the notion that one can pick a particular solution which is preferable by secondary criteria. 


Ford and Fulkerson [1] introduced the notion of maximal dynamic flows in networks and provided 
an ingenious algorithm for solving the dynamic linear programming problem. A dynamic network 
consists of arcs and nodes with two nonnegative integers associated with each arc. One of the integers 
defines the capacity of the arc and the other the time required to traverse the arc. There are two dis- 
tinguished nodes in the network, one for the source where all flows originate and one for the sink 
where all flows terminate. If at each node the commodity can either be transshipped immediately or 
held over for later shipment, what is the maximal amount of commodity flow from source to sink in 
a specified number of time periods? Solutions constructed by the Ford-Fulkerson algorithm have the 
attractive property of being presented as a relatively small number of activities (chain flows which 
represent a shipping schedule) which are repeated over and over (temporal repetition) until the end of 
the allotted time span. A consequence of this temporal repetition is that a single arc flow value in each 
arc represents an optimal solution independent of how these arc flows are decomposed into chain 
flows. In networks of operational interest, these optimal arc flow values frequently have an upper 
bound different from the lower bound. These bounds say that you can always find an optimal chain 
flow solution which lies on or between the stated range and no optimal solution lies outside these bounds. 
The procedure set forth in the sequel calculates these boundary values for each arc. 

As shown in [2], the Ford-Fulkerson dynamic flow algorithm also solves the minimal cost flow 
problem. In this problem, roughly described, we are given a network having one or more sources and 
one or more sinks with availabilities at the sources and requirements at the sinks. There are inter- 
mediate nodes between the sources and sinks with connecting arcs having assigned capacities and 
unit shipping costs. The problem is to construct a feasible flow, if one exists, which minimizes cost in 
satisfying the requirements within the given availabilities. Similarly to the dynamic flow problem, the 
bounds on optimal arc flows will indicate the variety of ways, if any, in which such a feasible flow can 
be constructed. 

Before describing the computing procedure for bounded flows, we will give a more formal statement 
of the dynamic flow problem referred to above. 




Given the network G— [N; A] with source 5 and sink t of the node set N, we let nonnegative 
integers c(x, y) and a(x, y) be the capacity and traversal time, respectively, of each arc (x, y) in the 
arc set A. Let/(x, y; t) be the amount of flow that leaves x along (x, y) at time t, consequently arriving 
at y at T+a(x, y). Also f(x,x;r) is the holdover at x from r tor+ 1. If V(P) is the net flow leavings or 
entering t during the P periods to 1, 1 to 2, . . ., P — 1 to P, then the problem may be stated as the 
linear program: 

subject to 

Maximize V(P), 

2 I 1705, T, t) -/(y, s; r-a(y, s))]-V(P) = 0, 

7 = y 

^[f(x,y,T)-f{y,x;T-a(y,x))]=0, x¥^s,t; t = 0, 1, ...,P, 

T=0 y 

=£/(*, t,t) ^c(x,y). 

Here a(x, x) = 1, c(x, x) = °° for holdovers at node x. If /(x, y; t) and F(P) satisfy the above 
constraints, we call /a dynamic flow from s to t (for P periods) and say that /has value V(P). If also 
V(P) is maximal, then /is a maximal dynamic flow. 


To initiate the bounded arc flow computations, a maximal dynamic flow solution is required. The 
Ford-Fulkerson algorithm is used to obtain such a solution. We set forth their algorithm here, using the 
notation of [2], as Routine I in the interests of a coherent presentation for the convenience of the 

Routine II is an application of Kirchhoffs first general law on the conservation of flow at any node 
in a network. Routine II calculates "slack bounds"; slack in the sense that flow values contained by 
these bounds may not all be optimal, however, no optimal arc flow values are excluded by the bounds. 
Routine II is very efficient in calculating bounds based on local information at a node and is retained 
for that reason. 

Routine HI tightens up these bounds to their true value where necessary. Routine III is, essentially, 
an application of Routine I (Ford-Fulkerson algorithm) to a subnetwork composed of a special set of 
admissible arcs from the original network. Using the original flow solution and the flow boundaries 
from Routine II, optimal flows are circulated through the admissible arcs about a selected arc being 
scanned. Treating the initial and terminal nodes of the arc being scanned as a temporary source and 
sink, optimal flows are maximized and minimized in the arc thereby determining the absolute upper 
and lower optimal arc flow boundaries. If at any time the varying optimal flow values in the admissible 


arcs are observed to reach a bound of Routine II, that bound has been verified since a known solution 
lies on a bound which is suspect of being too loose. Consequently, several unscanned arcs may get 
scanned in the process of scanning a particular arc. The reader may note that Routine HI, slightly 
revised, could compute the true bounds without the aid of Routine II. Experience has shown that 
retaining the services of Routine II saves a substantial amount of computation time. 

Ford-Fulkerson Algorithm 

Initial Conditions 

1. Establish P, the time span of interest. 

2. Set node numbers n(x) = for all x. 

3. Define a{x, y) = ir{x) + a(x, y)--n(y) and consider as an admissible arc any (*, y) where 
a(x, y)=0. 

4. Set all /U, y)=0. 

5. During the routine a node is in one of the following states: 
Unlabeled and unscanned, 

Labeled but unscanned, or 
Labeled and scanned. 

6. All nodes are unlabeled. 

Arc Flow Generating Routine 

STEP 1. To node s assign the label [+t, °°] and consider node s as unscanned. 

STEP 2. Take any labeled, unscanned node x and suppose that it is labeled [±u>, A]. To all nodes 
y that are unlabeled and such that: 

a. (x, y) is admissible and/(x, y) < c(x, y), assign the label [+x, min (h, c(x, y)—f(x, y))], or if 

b. (y, x) is admissible and/(y, x) >0, assign the label [— x, min (h,f(y, x))]. 

Consider node x as scanned and any newly labeled y-nodes as unscanned. Repeat until: 

a. node t is labeled (breakthrough), or 

b. no new labels are possible and node t is unlabeled (non-breakthrough). 

STEP 3. If breakthrough results and suppose node t is labeled [+y, h], replace /(y, t) by 
/(y» t)+h. Next turn attention to node y. In general, if y is labeled [+x, m], replace /(x, y) 
by f(x, y) + h, or if y is labeled [— x, m], replace/(y, x) by/(y, x) — h. Next turn attention to node x. 
Ultimately the node s is reached; at this point stop the replacement process. Starting with the new 
flows thus generated, discard the old labels and repeat the above Arc Flow Generating Routine until no 
new labels are possible and node t cannot be labeled. When this condition results, proceed with the 
following Non-Breakthrough Processing Routine. 

Non-Breakthrough Processing 

STEP 1. Calculate a value of 8 as follows: Define 

Ai = {(x,y) \xiX, yeX, d(x, y) > 0}, 


^2= {(x, y) \xeX, yeX, d(x, y) < 0}, 

where X is the subset of labeled nodes and X is the complementary subset of unlabeled nodes. Let 

8, = min [d(x, y)], or P + 1 — jr(t) if A x = <f>, 

8 2 = min [—a (x, y ) ] , or P + 1 — ir(t) if A 2 = (f>. 

(x,y)(A 2 i t <t> 


8= min (8i, 8 2 ). 

Now define for all x, new node numbers it' (x) as 

'flr(x) if* is labeled. 

min [ir(x) + 8; 7r(x) + P 4- 1 — 7r(t)], if oc is unlabeled. 

After new node numbers have been assigned, consider tt' (x) as ir(x). 

STEP 2. If 7r(0 < P + 1, return to the Arc Flow Generating Routine. If ir(t) = P+l, algorithm 


Slack Arc Flow Bounds 

Initial Conditions 

1. A maximal dynamic flow solution has been obtained for a particular P. Retain d(x, y) and 
f(x,y) for all (x,y). Retain tt(x) for all x. 

2. Set for all (x, y) : 

'0/0, if d(x,y) > 0, 

G(x, y)lg(x, y) ={ c(x, y)/0, if d(x, y) = 0, 

c(x, y)/c(x, y), if d(x, y) < 0. 

3. Add an arc (t, s) and set G(t, s) = g(t, s) = 2 /( 5 » y)« 


4. Order all nodes in increasing n(x) sequence with no preference where equality exists. 

5. All nodes are unscanned. 

6. If d(x, y) — 0, then (x, y) is an admissible arc. 


STEP 1. Take the lowest ordered, unscanned node x and to all admissible arcs calculate and 
insert into the arc record 


(la) G'(x, y) = min[G(x, y); £ G(i, x) -^g(xj) + g(x, y) |, 


g'(x, y) = max[g(x, y); Y g(i, x) -J G(x, j) + G(x, y)\. 

If G' (x, y) < G(x, y) or g' (x, y) > g(x, y), consider y as unscanned.* Now consider the newly assigned 
G' (x, y) andg' (x,y) asG(x,y) and g(x, y), respectively. When all admissible arcs have been examined, 
consider x as scanned, proceed to the next lowest ordered unscanned node and scan that node. Scan 
all nodes. 

STEP 2. When all nodes have been scanned, then consider all nodes as unscanned. Take the 
highest ordered, unscanned node y and to all admissible arcs calculate and insert in the arc record 

(2a) C'U,y) = min[G(;t,y); £ G(y,j) -^g(i, y) +g(x, y)], 

(2b) g'(x,y) = m a x[g(x,y); ^g(y,j) -%G(i,y) +G(x,y)\. 

If G' (x, y) < G{x, y) ox g' {x, y) > g(x, y), consider* as unscanned. Now consider the newly assigned 
G' (x, y) and g' (x, y) as G(x, y) and g(x, y), respectively. When all admissible arcs have been exam- 
ined, consider y as scanned, proceed to the next highest ordered unscanned node and scan that node. 
Scan all nodes. 

STEP 3. When all nodes have been scanned, then consider all nodes as unscanned. Take the lowest 
ordered, unscanned node x and to all admissible arcs calculate and insert in the arc record the results 
of Equation (1). Consider x as scanned. If G' (x, y) < G(x, y) or g' (x, y) > g(x, y) consider y as un- 
scanned. Proceed to the next lowest ordered, unscanned node and scan that node. Scan all nodes. This 
terminates Routine II. Go to Routine III. 

NOTE: As described above, Routine II sweeps from source to sink, sink to source and then source 
to sink. Computational experience has indicated that three sweeps achieve the best compromise 
between best bounds and reasonable computing times. One could specify repetitive sweeps until a 
complete sweep had been made with no changes to G(x, y)lg(x, y). Alternatively, premature termi- 
nation is allowed at any point since we are only seeking approximations to the true G(x, y)lg(x, y). 


Taut Arc Flow Bounds 

Initial Conditions 

1. Label all arcs as follows: 

a. "Gg" it f(x, y) = G(x, y) = g(x, y), 

b. "C+" if/(x, y) = G(x, y) * g(x, y), 

c. "+g" if/(x, y) = g(x, y) ¥= G(x, y), or 

d. "++" if none of the above is true. 

The scan state of y may be either scanned or unscanned. If neither condition is met, then this state remains unchanged. 
This potential redundancy is necessary to accommodate the a(x, y) =0 instances where tt(x) = n{y), thus both arcs, (x, y) and 
(y, x) may be admissible. The stated inequalities prevent looping. 


2. Mark all arcs labeled "Gg" and consider them scanned. 

3. An unmarked arc is an admissible arc. 

4. All nodes are unlabeled and unscanned. 

5. If all arcs are scanned, terminate the routine. Otherwise, go to the procedure below. 


STEP 1. Take any unscanned arc (x, y) and consider x as s' and y as t ' . 

a. If {$', t ') is labeled "+#" go to STEP (2) below and omit STEP (3) below. 

b. If (*', t') is labeled "G+" go to STEP (3) below. 

c. If (s\ t') is labeled "++" go to STEPS (2) and (3) below. 

STEP 2. To node t' assign the label [+$', G(s' , t') —f(s', t')]. Take a labeled unscanned node x, 
initially t' is the only such node, and suppose it is labeled [±w, h], to all nodes y that are unlabeled and 

a. (x, y) is admissible and f(x,y) < G(x,y), assign the label [+ x, m\n(h, G(x,y) — f(x,y))], 
or if 

b. (y, x) is admissible and/(y, x) > g(y, x), assign the label [— x, min(/»,/(y, x) —g(y, at))]. 

Consider node x as scanned. Repeat until node 5' is labeled (breakthrough) or no new labels are possible 
and node 5' is unlabeled (non-breakthrough). If breakthrough results and node s' is labeled [+y, h], 
replace /(y, 5') by/(y, 5') + h or if nodes' is labeled [— y, A], replace / (s ', y) by/(s',y) — h. In either 
case, if the arc (y, s') or (s', y) was previously considered scanned, it remains scanned bearing the 
label "Gg." If not previously scanned, consider the following cases. 

a. Node s' is labeled [+y, h]. If the new f(y, s') = G(y, s') and the current label is "+#," 
relabel the arc "Gg" and consider it scanned; or if the current label is "++," relabel the arc "G-f" 
and consider it unscanned. If the new/(y, 5') < G(y, s'), the arc retains its current label and remains 

b. Node s' is labeled [ — y, h]. If the new/(s', y)=g(s', y) and the current label is "G-f," re- 
label the arc "Gg" and consider it scanned; or if the current label is "++," relabel the arc "+g" and 
consider it unscanned. If the new f(s'y) >g(s'y), the arc retains its current label and remains 

Next, turn attention to node y and repeat the replacement and labeling process as for (y, s') or (s' , y), 
incrementing or decrementing the flow value by h and determining whether the arc is to be considered 
scanned or unscanned. Continue this replacement process until a reverse path to node 5' has been 
traced out. At this point, stop the replacement process. If /(«', t')=G(s', t') a condition for non- 
breakthrough exists. If not, then starting with the new flows thus generated, discard the old node labels, 
consider all nodes as unscanned and repeat this Procedure until non-breakthrough results. When non- 
breakthrough results, record G(s' , t') =f(s', t'). Erase all node labels, consider all nodes as unscanned 
and proceed to STEP 3 below if lb is satisfied. Otherwise, go to STEP 4. 

STEP 3. Do STEP 2 except assign the initial label to node s' as [—t',f(s', t')—g(s\ t')]> s t0 P 
the labeling and replacement breakthrough process at t' and, on nonbreakthrough, record g(s' , t') 
= f(s',t'). Then go to STEP 4 below. 

STEP 4. Consider (5', t') as scanned and reverted to its original (x, y) designation. Erase all node 
labels and consider all nodes as unscanned. Take the next unscanned arc (x, y) and repeat all of the 
above Procedure. If no such arc exists, terminate the routine. G(x, y) and g(x, y) are the firm upper and 
lower bounds, respectively, for optimal arc flows. 



We stated earlier that the Bounded Flow Routine II has proven useful in reducing computing 
times. Since Routine III operates on the slack bounds produced by Routine II, it is necessary to show 
that optimal arc flow values are never excluded by Routine II. We will now prepare the way for stating 
such a theorem. 

LEMMA 1: The sequence of equation pairs for G' (x, y) and g' (x, y) in the Bounded Flow Routine 
II will produce monotonic nonincreasing values of G' (x, y) and monotonic nondecreasing values of 
g'(x, y) for all (x, y). 

PROOF: The truth of the assertion is a natural consequence of the structure of Equations (1) and 
(2) for G" (x, y) andg'U, y) where the upper limit for G' (x,y) is G(x,y) and the lower bound for g'(x, y) 
is g{x, y) . Independent of the number of iterations of the equation pairs, the previously calculated value 
for G(x, y) or g(x, y) provides a bound consistent with the assertion above. 

LEMMA 2: If G(x, y) 3= g(x,y) for all (x, y), Equations (1) and (2) will maintain G' (x, y) 3= g'(x,y) 
for all (x, y). 

PROOF: Consider the pair of Equations (1) for G' (x, y) and g' (x, y). The cases of interest are 

(1) G'(x,y) = ^G(i,x)-^g(x,j)+g(x,y)<G(x,y), 

i J 


g'(x, y) = Yg(i, x) -yG(x,j)+G(x, y) > g(x, y). 

(2) r f 

Ignoring the inequalities and subtracting the second equation from the first, we have 
G' (x, y)-g' (x, y) = ^[G(i, x)-g(i, x)]+ £ [G(x,j)-g(x,j)l 

By hypothesis, each of the two summation terms on the right side is nonnegative, therefore, the dif- 
ference on the left side is nonnegative. 

Equations (2) are symmetric with (1) for calculations progressing from sink to source. A similar 
exercise to the above would produce equivalent results. 

That the hypothesis is true at the outset can be seen in the Initial Conditions established by 
Condition No. 2. Recall that we insisted that c(x, y) 3=0. 

LEMMA 3: When scanning a node x, the equations of Routine II always calculate ^G(x,j) 3= 


PROOF: Consider Equation (lb) which is 

g'(x, y) = max[g(*, y); ^g(i, x) -^G(x, j) + G(x, y)], 

i J 




We restate the above inequality by changing signs. Now, 

G(x, y) -g'(x, y) ^G{x, j)-^g{i, x). 
i i 

Lemma 2 states that G' (x, y) ^ g'(x, y) and Lemma 1 assures us that G(x, y) 3= G' {x, y). Therefore, 
the left side is nonnegative. Consequently, ^G(x,j) 3= ^g(i, x). 

5 i 

Analogous arguments to those above for Lemma 3 would prove its corollaries for the following: 

^C(i,jc) 3= ^g(x,j) by considering Equation (la), 

' j 

^G(y,j) ^^g(i, y) by considering Equation (2a), and 

i i 

^ G(i, y) 3= ^g(y, j) by considering Equation (2b). 

' j 

THEOREM 1: Let G(x,y) and g(x, y) be the upper and lower bounds, respectively, as determined 
by the Bounded Flow Routine II for all arcs (x, y) in G = [N; A ]. Then any temporally repeated maximal 
dynamic flow solution for this network will contain for all (x, y) a flow/(x, y) such that 

g(x, y) *£/(*, y) *£ G(x, y). 

PROOF: Required flows are initially set in the inadmissible arcs according to the optimality 
criteria, i.e., for those arcs where a(x, y) < 0, the minimum flow is equal to the arc capacity and for 
those arcs where d(x, y) >0, the maximum flow is equal to zero. Where d(x, y) =0, the flow may be 
anywhere between zero and the arc capacity. In the return arc from sink to source, we have set the 
minimum and maximum equal to the maximal static flow for the system for the time period of interest. 
These values do not change at any time. Initially for the admissible arcs the maximum flow is set at the 
arc capacity and minimum flow at zero which are the broadest possible bounds since we insist that 
< f(x, y) *£ c(x, y) for all (x, y) . Clearly then, our theorem holds for the initial conditions. 

Lemma 1 tells us that G(x, y) never increases, but tends to decrease and g(x, y) never decreases, 
but tends to increase while Lemma 2 maintains that G(x, y) 3 s g(x, y) for all (x, y) at all times. Lemma 3 
and its corollaries insure that the available flow into a node is at least equal to the required flow out of a 
node and, conversely, the required flow into a node is no greater than the available flow out of a node for 
all nodes at all times. Since initially the feasibility of meeting local conditions of optimality exists 
everywhere, we will proceed by induction on the sequence of nodes to be scanned, i.e., 

s, Xi, X2, . . ., t, where Tr(xi) *£ Tr(xi+i). 

Where equality holds between two or more nodes and the connecting arcs have zero traversal times, 
there may be redundant scanning of any of the nodes, but Lemmas 1 through 3 will hold nonetheless 
for each scan. 


For the initial scanning of 5, we know that G(s, x) and g(s, x) are all valid in the sense of the 
theorem. Therefore the cases of interest are where G' (s, x) < G(s, x) and g' (s, x) > g{s, x). Accord- 

G'(s,x) = G(t,s)-^g(s,j), 

j * x 


G'(s,x)+^g(sJ) = G(t,s). 


Suppose we substitute 


f(s,x)>G'(s,x) for G'(s, x). 



a contradiction to conservation of flow at 5 and the relative flow conditions maintained by Lemma 3. 


and we substitute /(s, x) < g' (s, x) for g' (s, x). Then 



again a contradiction as above. Thus we see that G' (s, x) and g' (s, x) are valid upper and lower bounds, 
respectively, for all x, whether or not they have been changed from their initial values. 

Next consider X\. If ^ G(i, xi) or V g(i, *i) are no longer their original values, we know from the 

i i 

above that they are valid. Again, the equation of interest is 

G'(x u y)+ J) g(x u j) = ^G(i, xx), 

j* y i 

and a substitution f{x\, y) > G' (x u y) constitutes a contradiction. Similarly, 

*'(*., y)+ 2 C(*„y) = 2*(».*i). 

j * V i 

A substitution f(x\, y)<g'(x\, y) produces a contradiction and G'{x u y) and g'(xi, y) are 


As we proceed, we see that this holds for any Xj for the calculations are based on ^ G(i, Xj) and 


2 g(i, Xj) which, although possibly changed from their initial values, are known to be valid in the 


sense of the theorem. 

At the pivot node t , we reverse our sequence and proceed to 5. Lemmas 2 and 3 insure that ultra- 
conservative conditions hold at this crucial point and a parallel argument to the s to t sequence would 
produce equivalent results. One may make as many iterative passes from s to t and t to s as desired 
without violating the conditions asserted by the theorem. This concludes our proof. 

We now turn our attention to the Bounded Flow Routine III. The scanning process of Routine HI 
is basically an application of the Ford-Fulkerson (F-F) algorithm (Routine I) as it operates between 
nonbreakthroughs. The maintenance of conservation of flow at every node and the maximizing proper- 
ties of this algorithm, are well established [2]. Our proof for Routine III then reduces to showing that 
there exists a formal equivalence between our application and the standard conditions for the F-F 

Consider STEP 2 which is concerned with determining the best G{x, y). G(x, y) has replaced 
c(x, y) as the upper bound and g(x, y) has replaced zero as the lower bound. Justification for these 
substitutions can be found in Theorem 1. Our new network has all the arcs and nodes of the old except 
the arc being scanned. As in the F-F algorithm, only those arcs where a(x, y) =0 are admissible for 
labeling purposes because under the optimality criteria, it is only in these arcs that the flow can be 
altered. We start with a feasible flow as does the F-F algorithm. The source for this network is t' and the 
sink is 5'. The source gets labeled with the maximum amount of flow that can be augmented in (5', t') , 
i.e., G(x, y) —f(x, y) by Theorem 1. In the F-F algorithm, this is taken to be °° for it is not known 
a priori what the maximum amount of flow augmentation is. The labeling rules are the same as are the 
rules for breakthrough and nonbreakthrough. There is a distinction in the replacement process where 
the flow is incremented or decremented in a sequence of arcs which go from sink to sink. However, 
the last arc in this sequence is the arc being scanned which is not a part of the current network. When 
the routine reaches nonbreakthrough, further flow augmentation is impossible and we have the maxi- 
mum flow in the arc being scanned. 

The argument for STEP 3 follows that for STEP 2 above. The source for this network is s' and the 
sink is t ' . The source gets a negative label with the maximum amount the flow in (s', t ') can be reduced, 
i.e. ,f(s', t') —g(s', t') by Theorem 1. On nonbreakthrough, the flow in (s' , t') will have been decre- 
mented to its absolute minimum with respect to optimality and we record the final g(x, y) =f(x, y). 
We can now state the following theorem. 

THEOREM 2: Let G(x, y) and g(x, y) be the upper and lower bounds, respectively, as determined 
by the Bounded Flow Routine III for all arcs ( x , y ) in G = [ N; A ]. Then the integers n, such that 

g(x, y) ^ n =£ G(x, y) 

provide an exhaustive set of valid arc flows for which there exists an integer, temporally repeated, 
maximal dynamic flow solution for G[N; A], 


In Figure 1 is shown a simple network and a dynamic flow solution for the stabilization time of 
P= 15. Following stabilization time, the static flow of 10 is repeated each time period and new solutions 



Figure 1. 

are not necessary since the arc flows do not change in value. The small lower numbers in the nodes 
are the node names and the larger upper numbers are the node numbers tt(x) 's. The data in the arc 
boxes are the following: Upper left— capacity; upper right — transit time; and the lower number is the 
arc flow/(x, y). Capacities and transit times are symmetric, e.g., c(x, y) = c(y, x). 

In Figure 2 is shown the bounded arc flow solution based on the flow solution in Figure 1. The net- 
work data is the same as Figure 1 except the lower numbers in the arc boxes are the upper/lower bounds 

In decomposing all alternative routes for their maximum optimal flow, we get the following set of 
nine routes. 

Figure 2. 



Possible chain 

Time length 

P=15 use 

Max flow 





































These alternative routes offer a fair variety of different ways in scheduling a particular optimal solution. 
For example, we list below two different solutions for contrast. Here, again, the time span is 15. 
Solution A Solution B 























Consider, for instance, that Nodes 1 and 2 are origins and Nodes 4 and 5 are destinations. Then if 
there was some preference, not formally stated, for maximizing the origin-destination deliveries 1-4 
and 2-5, one would choose Solution A. However, if the preferred pairings were 1-5 and 2-4, Solution B 
is the best. 


Some version of the Bounded Flow Algorithm has been in use at The George Washington University 
since late 1967. The algorithm and the associated computer codes have been revised several times with 
the objective of increasing their efficiency. Currently, the bounded arc flow computation takes approxi- 
mately one minute on a 500 arc network. The program is written in PL/1 for an IBM 360/50. 


The research was conducted as part of the Program in Logistics of the Institute for Management 
Science and Engineering, The George Washington University. The work was supported by the Office of 
Naval Research. 

Special recognition is due to Donald J. Hunt of the Program in Logistics who, since the very begin- 
ning of this development, has made material contributions to the power and efficiency of the computa- 
tional procedures. He is solely responsible for the large gains achieved in decreased running time which 
followed the original implementation of the algorithm. 

Thanks are also due to Raymond W. Lewis of the Program in Logistics for his valuable observations 
and suggestions during the various levels of algorithmic development. 


[1] Ford, L. R., Jr. and D. R. Fulkerson, "Constructing Maximal Dynamic Flows from Static Flows," 

Operations Research 6, 419-433 (1958). 
[2] Ford, L. R., Jr., and D. R. Fulkerson, Flows in Networks (Princeton University Press, 1962). 


Juan Prawda 

Tulane University 
New Orleans, Louisiana 


This paper extends Connors and Zangwill's work in network flows under uncertainty to 
the convex costs case. In this paper the extended network flow under uncertainty algorithm 
is applied to compute /V-period production and delivery schedules of a single commodity 
in a two-echelon production-inventory system with convex costs and low demand items. Given 
an initial production capacity for N periods, the optimal production and delivery schedules 
for the entire N periods are characterized by the flows through paths of minimal expected 
discounted cost in the network. 

As a by-product of this algorithm the multi-period stochastic version of the parametric 
budget problem for the two-echelon production-inventory system is solved. 


In a recent paper Connors and Zangwill [7] developed the Network Flow Under Uncertainty 
(NFUU) or r-networks by allowing the requirements or availabilities at the nodes of a network to be 
discrete random variables with known probability distributions. They extended the standard deter- 
ministic multistage network flow problem introduced by Ford and Fulkerson [11]. The underlying 
structure of network flow problems was exploited in Ref. [7] to produce both a new structure which is 
not a deterministic network, but maintains many of its properties and a new node which replicates 
flows instead of conserving them. They called the former r-networks and the latter r-nodes. Construction 
of NFUU from a given N period stochastic problem is not given in this paper. The reader is referred 
to [7]. Given the W-period problem and convex objective criteria, we will develop an algorithm to cal- 
culate the network flow problem that minimizes expected cost. 

Two applications of this algorithm are given: 

1) To compute optimal /V-period production and delivery schedules of a single commodity in a 
two-echelon production-inventory system with convex costs and low demand items; and, 

2) To solve the parametric-budgetary problem in the multiperiod stochastic case, corresponding to 
the system described in (1). 

This paper is organized as follows: In section 2 the Convex Network Flow Under Uncertainty 
Algorithm is stated, its validity and convergency proven; in section 3 the A^-period, two-echelon pro- 
duction and delivery inventory problem is stated. In section 4 we extend the parametric budgetary 
problem to the multiperiod, stochastic case, for the system considered in section 3. 


Let G— (N, A) denote a Network Flow Under Uncertainty (NFUU), where N is a finite collection 
of elements x, y, . . . and A is a finite subset of ordered pairs (x, y) of elements taken from N. N is 


518 J. PRAWDA 

supposed to be of the form N=N t U N 2 U 7V 3 with N t D Nj=<b for i, j= 1, 2, 3, i 4=;'. The elements of 
^Vi are called nodes, the elements n, r 2 , . . . of N% are called replication nodes or r-nodes and the 
elements C\, c 2 , . . . of Na are called collating nodes or c-nodes. Members of A are referred to as arcs. 
All arcs will be supposed to be of the form (x, y) with x =t= y, x, y in N. We exclude arcs (x, y) where 
both x, y are in Ni, i = 2, 3 and arcs going from r-nodes to c-nodes and vice versa. 

If x is in N, we let a(x) ("after x") denote the set of all y in N for which (x, y) is in A, that is, 

a(x) = {yeN\ (x,y)eA}. 

Similarly, we let b(x) ("before x") denote the set of all y in N for which (y, x) is in A, that is 

b(x) = {yeN\(y,x)eA}. 

Given G, each arc (x, y) in A has associated with it a nonnegative real number q(x, y) , called the 
capacity of the arc {x, y) in A; a nonnegative integer/(jc, y) called the flow of the arc (x, y) in A; 
and a nonnegative real number g(x, y), called the expected discounted cost of (x, y) in A. Both/ and 
g are functions from A to the nonnegative reals, the former having nonnegative integers as its range. 
Let 5, called the source, and t, called the sink, be two distinguished elements of TV. 

Each rk in N% has a single input arc and several output arcs and possesses the following two 

(1) f(x, rk) — /(/"k, y) for all y in a(/>) and some x in 6(/>) 

(2) g(x, rk) 3* 0, g(rk, y) = for all y in a(rk) and some x in b(rk). 

Property (1) merely states that flow on each of the output arcs of an r-node must be identical with that 
on the input arc, and (2) states that all the outgoing arcs of an r-node have an expected discounted cost 
of zero. Each c* in A^ is essentially the negative of an r-node, that is, it has several input arcs and a single 
output arc and possesses the following two properties: 

(3) f(y, Ot) = f(ck, x) for all y in b{ck) and some x in a(ck) 

(4) g(ck, x) ^0, g(y, Ck) =0 for all y in b(c k ) and some x in a(c k ). 

Properties (3) and (4) are, respectively, the negative of (1) and (2). 
It is shown in [7] that 

(5) G=UG 4 U^ 2 UiV3, 


where G l — (N(, A 1 ), i=l, . . ., M, M is a finite integer and G' D Gi = <b for i,j=l, . . ., M, i±j. 


Each G l , i=l, . . . , M is called a subnetwork and it is an ordinary network consisting of ordinary 
nodes x', y', . . . in N} and ordinary arcs (x } , y') in A' where x', y' are in N{. In each subnetwork 
G l the total inflow equals the total outflow, and the flow is conserved at each node in /V,', (i= 1, . . .,M). 
Two subnetworks, say G' and G>, i,j= 1, . . ., M, i =¥j, are connected if there exists at least one 
rk in N-2 or Ck in N 3 for which (x i , r k ) and (a, yj) are in A when x'eN{ and ycA^j or (aj j , Ck) and (c^, y') 
are in A when * j e/V/ and y'eNl. Let 

yVj= {a-e^ 2 | (r A -, x l ) or U\ r fc )e4, Ar j e/Vj} 

7Vj= {cA-eA^a | (c fr , ac') or (*■', c fc )e4, Jc-'eA^}} 

be the sets of r and c nodes that connect subnetwork i (i=l, . . ., Af) with the rest of the NFUO. 

From the NFUU G= (TV; A) (Figure 1) we observe that Wi = { 1,2, . . ., 17, 18}, N 2 = {n, r 2 , r 3 }, 
A f 3={c,, c 2 , c 3 }, and M = 8. 

Figure 1. A network flow under uncertainty or r-network. 

The algorithm to follow is based on the works of Connors and Zangwill [7] and Hu [14]. It is very 
closely related to the works of Beale [2], Busacker and Gowen [4], Hu [15], and Zangwill [24]. This 
algorithm, utilizing the decomposition (5), iterates by determining shortest routes or routes of mini- 
mal expected discounted cost along the subnetworks (which are ordinary networks) forcing one unit 
flow on this route and making appropriate adjustments for the r and c nodes. 

520 J. PRAWDA 

Let h[f(x, y)] be a nonnegative convex function onf(x, y) for all (x, y) in A, such that h(0) = 
and the flow function f(x, y) is required to have nonnegative integers as its range. 
The cost function for the entire network is 

X h[f(x,y)]. 

all Or, y)tA 

This cost function is a sum of convex functions and thus convex. Let 

(6) h[f(x,y)]=h[f(x,y) + l]-h[f(x,y)] 
iorf(x, y) 3= and all (x, y) in A, and 

(7) h[f(x,y)] = h[f(x,y)-l]-h[f(x,y)] 

iorf(x, y) > and all (x, y) in A. 

Expression (6) defines the up-cost of an arc and (7) the down-cost of an arc. It is shown in [14], 
that h{ ) > and h{ ) < 0; h(a) < h(b) for a < b and \h(a)\ < \h(b)\. A particular flow called a 
path-flow is a flow with/(s, x)=f(x, y)= . . . =f(z, t) = l and f(u, w)=0 for all u, w 4= s, x, y, 
. . ., z, t. If the cost of a flow with value i' is known and we superimpose a path flow on this given 
flow, the resulting flow has value v+ 1. h is used if the arc flow of the path flow is of the same direction 
as that of the arc flow of the flow with value v and h is used if the two flows are of opposite directions. 
The sum of h and h used in the path flow is called the incremental cost of the path flow. 

An iteration of the convex NFUU algorithm first requires construction of a modified r-network 
from the current flow in the original r-network. A shortest route algorithm is then applied to determine 
the shortest route from source to sink in the modified r-network using the up cost and down cost 
(h( ) and h{ ), respectively) of an arc as its length. One unit flow is then forced from source to 
sink in the original network along a route corresponding to the shortest path obtained in the modified 
r-network. The up cost and down cost of all the arcs in the modified r-network are redefined based on 
the new flow pattern of the original network. This cycle is repeated until the amount of flow at t in the 
original network is /.* 

Given G (the original network) and I (a positive integer corresponding to an input flow to the net- 
work), the precise algorithmic statements for the convex NFUU algorithm are: 

STEP (Initialization): Set/(*, y) = for all {x, y) in A. 

STEP 1 (Network Modification): Given the current flow in the original network f(x, y) for all 
(x, y) in A, define a modified r-network as follows: 

a) If =£ f{x, y) ^ q(x, y) leave the arc (jc, y) in the modified r-network with cost h[f{x, y)] as 
defined in (6) 

b) If/(*, y) = q{x, y) , delete the arc {x, y) from the modified r-network 
c)if0< f(x,y) and 

i) x, y are in N%, add a reverse arc (y, x) in the modified r-network with cost h[f(x, y)] as 
defined in (7) 

ii) x is Ni and y is N->, add reverse arcs (y, x) and (z, y) in the modified r-network for all z in 

"Definition of/ is given in the next sentence. 


a{y) , the former with cost h[f(x, y) ] as defined in (7), and the latters with cost zero 
iii) x is N 3 and y is N\ , add reverse arcs (y, x) and (x, z) in the modified r-network for all z in 
b(x), the former with cost K[f(x, y)] as defined in (7) and the latters with cost zero. 
Both b) and c) must be done if f(x, y) = q(x, y) for all (x, y) in A. 

STEP 2 (Shortest Route): Determine the shortest route or routes of minimal expected discounted 
cost from s to t in the modified r-network. Use properties (1) and (2) of r-nodes and (3) and (4) for c-nodes. 
Apply any shortest route algorithm [9, 11] with h( ) and h( ) as lengths in the arcs of the modified 

STEP 3 (Flow Augmentation): Send one unit flow from 5 to t in the original network along the route 
corresponding to the shortest path just obtained in Step 2, that is, along the path whose incremental 
cost in the modified r-network relative to the existing flow in the original network is minimum. 

STEP 4 (Iteration and Stopping Rule): If the amount at t in the original network is /, stop. Other- 
wise return to Step 1 with the current flow. 

If, during the application of Step 2, no shortest path exists from s to t in the modified r-network, the 
original problem is infeasible. 

The validity and convergence of the algorithm is proven in [7] for the linear case. The next theorem 
proves the validity and convergence of the algorithm for the convex case. 

Lety= 1 stand for the source s andy = m for the sink t. We will next prove that the convex NFUU 
algorithm is equivalent to compute a flow vector, /= (/i), i= 1, . . ., k (k is the total number of ele- 
ments in A) and a vector b = (bj),j = 2, . . ., m — 1, whose components are the amounts of flow at 
the nodes (bi — I, b m — — I), which 

subject to 

Df = b 

(8) //*£ q 

where Df=b merely states that: 

a) the total inflow and total outflow of every node in N\ must equal the amount of flow at the node, 

b) every incoming flow (outgoing flow) of an r (c) node equals the amount of flow at the node, 

c) every replication* (collating*) flow of an r (c) node equals the amount of flow at the node, 


q— (qi), i=l, . . .,k is the vector of arc capacities,/ is a£ X k identity matrix and h{f) = ^ [hi(ft)], 
where each M/i) is a real-valued convex cost function on/j, i= 1, . . ., k. 

THEOREM 1: Assume that at the end of iteration 5(5 > 0) of the convex NFUU algorithm, f s is a 
feasible solution to the convex programming problem 

(9) Min h(f) 

*The flow on each of the outgoing (incoming) arcs of an r(c) node is called replication (collating) flc 

522 J. PRAWDA 

subject to 



We let /° = and f s <I where / is the input flow to the NFUU. 
For this f s suppose /* is optimal to 

(10) Min/»(/) 

subject to 


10 if arc i is not in the path flow 
1 if arc i is in the path flow, 


for i=l A and 

10 if node j is not in the path flow 
1 if node j is in the path flow, 

for j — 1, . . ., m. Then f s+1 — f s + f s is optimal for 

(11) Min h(f) 

subject to 

Df = b' + V 


Pf: For /;(/) linear, the proof is in [7]. Assume h{f) convex with h(0) = 0. First we will prove /* +1 
is feasible for (11). Adding the first m constraints of (9) and (10) yields 


D(f* + f°) = b' + r, 

Df s+l = b' + -q. 

/* +1 is always bounded below by zero, since /* 5 s from (9) and/* 2= — /* from (10). The last 2k con- 
straints of (10) yield 

adding /■ 5= to the above inequality 

-//« + //• =£ //* + If* =£ q - /* + /*. 

The above inequality yields 

*s //* +1 ^ g , 
and thus/* +1 is feasible to (11). 

Next, we prove the optimality of/* +1 . Let the cost associated with problem (9) be h(f s ) and let 
h{f*) be the optimal incremental cost of problem (10) corresponding to the optimal path flow / s . It 
follows from the optimality of/* that h{f*)^h(f s ) for any path flow/* feasible to problem (10). 

Since the cost of the new flow/* +l in the original network equals the cost of the old existing flow 
/* in the original network plus the incremental cost of the path flow/*, in the modified r-network, it 
follows that 

Hf' +i ) = h(f'+p) = h(f') + h(f*)^h(f') + h(f») = h(f*+f*)=h(f*) 

for any feasible/* to problem (11). Then/* +1 is optimal to (11). Q.E.D. 

The last theorem proves that in terms of the NFUU, at the end of each iteration, the flow will be 
optimal for the amount thus far placed into the source of the NFUU. The convergency of the algorithm 
follows from the fact that/* +l is bounded above by a finite integer /and at each iteration / s+1 increases 
by a flow of value one. 

Next we suggest the application of the above convex NFUU algorithm to the solution of two prob- 
lems, one given in section 3 and the other one in section 4. 


Interest in the multiechelon inventory systems has been spurred by the existence of large mili- 
tary logistics networks and private industry. There are a number of papers on single and multiproduct, 
multiinstallation inventory models that have been published. A comprehensive review of these topics 
can be found in the excellent published bibliographies of Iglehart [17], [18], Scarf, Gilford and Shelley 
[20], and Veinott [22]. 

Several approaches have been used to compute optimal A-period reordering points in the pre- 
ceding multiechelon inventory systems. For instance, Bessler and Veinott [3], using the assumption 
that stock left over (backlogged) at the end of the A periods in each facility can be salvaged (purchased) 
at the same stationary unit price, decompose an A variable linear cost function into the sum of A — one 
variable linear functions, and a stationary policy given by a critical vector is shown to be optimal. 

524 J- PRAWDA 

Relaxing Bessler and Veinott's [3] assumption, the Af-period problem will, in general, not de- 
compose to A — one period problems and dynamic programming is used to compute the optimal policy. 
Others, such as Clark and Scarf [5, 6] have used dynamic programming. However, its use has been 
shown to be computationally infeasible for even simpler problems than the one considered by Bessler 
and Veinott. (See [17, 18].) 

The objective of this section is to suggest the use of the previous convex NFUU algorithm to 
solve yV-period, multiechelon production and delivery inventory systems. It is the structure of the 
NFUU networks that allows for some computational improvement to obtain optimal production and 
delivery schedules of /V-period, multiechelon stochastic inventory problems, with low item demands, 
with respect to other techniques used to solve similar systems, such as dynamic programming. 

This paper is concerned with the problem of scheduling the production jcoi, *02, • • •■> x on and 
allocation Xn, . . ., x t \, x-zu ■ ■ -,X2.\,- ■ -,x n i,. ■ ., x h n of a single product in facilities 1 , . . ., rain suc- 
cessive time periods 1,2, . . ., /V so as to minimize the total expected discounted costs over the 
N periods. The requirements of each facility are discrete random variables each of which has known 
probability mass function. 

Figure 2 illustrates a two-echelon system consisting of a plant, a warehouse 0, and n facilities 
numbered 1,2, . . ., n. Although we are interested in more general multiechelon systems, the preceding 
one will suffice to illustrate our approach. Some remarks concerning the generalization to more complex 
multiechelon systems are given at the end of this section. 



\= I, ,N 





Figure 2. 

At the beginning of period one we consider a known production capacity for the N periods. Let 
/ denote the production capacity. The production capacity could be present in this problem due to 
restrictions of raw material to produce the given single commodity. Uncertain requirements in facility 
i (i=l, . . ., n) in each period are satisfied insofar as possible from stock on hand at the beginning 
of the period in that facility and from the allocation and production of the commodity at the beginning 
of the period. Requirements which cannot be met in a given period (because, for example, limited 
production) are back logged until they can be satisfied by subsequent production or allocation in future 


Let au (i — 1, . . ., n; t—1, . . ., N) be a parameter associated with each facility i in any period 
t. this parameter will take the following values: 

a«=l,2, . . ., n it fort'=l, . . ., n and t=l, . . .,N. 

Let {Du, i=l, . . ., n; t=l, . . . , N} be a family of discrete, nonnegative random variables. For a 
fixed i and t, Du takes on values in {d\, d-i, . . ., d^,}, a set of nonnegative real numbers. Let 

P^=P{Z) tt =<*„„} for alia,,. 

The sequence {Pa U } of real numbers will be a probability distribution of D it with P^ ^ 0, and 


This distribution is assumed to be known. Du is not necessarily independent* between any two facili- 
ties or identically distributed for successive periods. Let to,, A (an, a,2, . . ., a«), i=l, . . ., n 
and 1 =£ t «S N, be the index associated with the random variables defined below. This notation identi- 
fies the sequence of realizations in facility i up to period t. Thus for example wo A (2, 1, 5) denotes 
the sequence of following events in facility i: In period 1 realization 2, in period 2 realization 1, and 
realization 5 in period 3. 

(l)0t A (a>n, G>21> • • ., 0)nl, . . ., 0)u, <02t, • ■ ■ , <Ont) ■ 

We let D^ "" for i = 1 , . . . , n and t = 1 , . . . , TV be the vector of realizations caused by the stochastic 
requirements in facility i up to period t. It is conditioned on all previous realizations in that facility in 
previous time periods, that is£M" n) , . . •»#J. w ji'~ 1) and occurs with conditional probability P(a>if), where 

P(<o il )=P{D it = d a . l \D i , t - l = da i>t _ v . . .,Dn=d ail }. 

Let ^°' ) (3= 0) be the production completed in the plant (warehouse) at the beginning of period t 
given the sequence of random events &>o< and *j"" ) (3 s 0) (i= 1, . . . , n), the allocation completed in 
facility i at the beginning of period t {t=\, . . ., N) given the sequence of random events (oit. Let 
j(wot)( t=z ]^ ., N) be the inventory at the end of period t in the warehouse given the sequence o>o< 
and I u a,U) (i= 1, . . ., n; t = l, . . ., N) be the inventory at the end of period £ in each facility i given 
the sequence o>u. We will assume that the initial inventory at all facilities and the warehouse is zero at 
the beginning of period one. 

In order to simplify the statement of the problem, we are provisionally going to suppress the index 
(t>u and merely refer to the random variables %u and Iu, i = 0, 1, . . ., n, t = 1, . . ., N. 

*If for a fixed i, Du(t=l, . . ., W) is a sequence of dependent random variables, then the marginal distribution P„ jk 
(1 =S k S N) can be obtained from the given joint probability distribution. 

526 J- PRAWDA 

We will assume that the lead time in production and delivery to the n facilities is zero. The inventory 
level equation becomes 

£ UiA-2 x kh ) for i = 

/i = l k=l 


2 {xih — Dih) for i*=l, . . ., n, 

where t— 1, . . ., iV. Let j8, be a nonnegative integer denoting the number of periods of backlog per- 
mitted for facility i, thus 

I it ^- ]g D ik for all i=l, . . ., n. 

k = I - 0j + 1 

Note that, in general, we would have 


which imphes that it may be optimal not to use all the production capacity / through the TV periods. Let 
ku be the known capacity of the production line for i=0 and of the transportation facilities for i >0 
(i = l, . . ., n) during period t (t= 1, . . ., N). Let Qu be the known storage capacity for the ware- 
house (t = 0) and the n facilities (i= 1, . . . , n) during period t (t= 1, . . ., N). Thus we will require 


xu < ku for i = 0, . . .,nandf=l, . . ., N 

Iu^Qu fori = 0, . . ., n andt=l, . . .,N. 



= ( r «"01> ("O/V* J->11> 1"1N> J w nl> J^nJV' "1 

\-*oi ' ' " '' *o\ '-*n ' • • •'•*!« ' - • ''ni ' • * "'•Sat / 

be the schedule vector for the entire system, given the sequence of realizations in all facilities i, 
i=l, . . . , n up to period N. 

We have the following costs in each period: production, shipping, holding, and shortage (the last 
one due to backlogged demand). The preceding costs are assumed to be convex functions of the quan- 
tities produced and delivered at the beginning of the period and of the quantities stored or backlogged 

*In terms of the NFUU this can be accomplished by arcs representing unused production capacity in the plant at each 
period t, t = \, 2 N. 


at the end of the period, respectively. The cost functions for successive periods need not be the same. 

Let 8 t , =£ 8, ^ 1 be the discount factor for period t. Let yi = l and y t = 8 ( , for t > 1. The 

total expected discounted cost F{z) is defined to be the sum of the following expected discounted 


i) Total expected discounted production and holding cost in the warehouse: 

2 yr[C,(* M ) +#<«(/<»)]. 
ii) Total expected discounted transportation, holding, and penalty cost in all facilities: 

£i>' [Tu(xu)+H u (Iu)h 


H it = Max {hit(Iit) , pa (lit)}, 

where hu{ • ), pu{ ■ ) are convex functions of their arguments satisfying 

haUit) -puVit) = if / l7 = 

hu(-) and pu(-) are, respectively, the expected holding and penalty costs,* for i — 0, . . ., n and 
t=l, . . .,N. Ct('), hit(') , Pit(') , and Tu(') are convex functions of their respective arguments with 
C t (0) = h it (0)=p il (0) = T it (0) =0for i = 0, . . ., n and t=l, . . ., N. 

Then the problem can be stated as: given the uncertain market requirements for each of the n 
facilities over the next /V periods and the production capacity for the N periods, find a production- 
allocation schedule z, called optimal, which minimizes the total expected discounted cost 

F(z) = J y* ■ [C t (x ot )+H 0( (ht) + 2 {Tu(x it ) + Hit(Iu)}] 

subject to 

^ *<« ^ h 

*Hu(Iit) may not be convex for lu = 0. However, in terms of the network flow approach there are going to be arcs with 
expected cost hn( ■ ) corresponding to storage of inventory in facility i at the end of period t and different arcs with expected cost 
p«( ' ) corresponding to backlogged inventory in facility i at the end of period t. Thus Hn(lu) is never used explicitly in the NFUU. 

528 J- PRAWDA 

O^xu^Ku for i = 0,-,. . .,nandt=l, . . ., N, 



Iu^Qu for i' = 0, •, . . ., n and t=l, . . ., N, 

X ( Xih ~ 2 Xkh ) for i = 


/« = 


^ (xit — Du) fori=l, . . ., n 



i it ^ -y />*. 

In order to solve the above problem we suggest that the above two-echelon multi-period, stochastic, 
production-delivery problem be rewritten in terms of the NFUU (see [7]), and then that the convex NFUU 
algorithm (described in section 2), be used to compute the optimal production and delivery schedules. 
Since the NFUU decomposes into a set of subproblems which are small network flow problems, this 
algorithm seems more attractive than dynamic programming, specially for multiechelon inventory sys- 
tems with low demand items (0-1 demands) and small number of time periods. It was shown in Ref. [7] 
that the amount of computer storage required by the NFUU algorithm is proportional only to n, the 
dimension of the one-stage problem. 

The approach suggested in this paper can be applied to more general multiechelon systems than 
the one depicted in Figure 2 (where transhipments between facilities are allowed), with consequent 
increase in the number of arcs and nodes in the NFUU. 


Suppose that a fixed budget of v dollars can be allocated among the production-line, transportation, 
and storage facilities of the existing production-delivery inventory system for the purpose of increas- 
ing the production capacity / for the N periods. The cost of increasing the capacity of the production 
line (i = 0) and transportation facilities (i=l, . . ., n) during period t{t= 1, . . ., N) is Vi, t dollars 
per unit increase. The cost of increasing storage capacity at the warehouse (i = 0) and the n facilities 
(i=l, . . ., n) during periods {t= 1, . . ., N) is v g{ . ( dollars per unit increase. 

Let w it u and w a be decision variables corresponding, respectively, to the amount of additional 
capacity to be built in the production-line (i = 0), transportation (i= 1, . . ., n), and storage facilities 
during period t(t=l, . . ., N) dependent upon the sequence oiu of random events in facility i up to 
period t. Then the problem of increasing the production capacity for the W periods to, say, /' (/' >/) is 
to minimize the following expected expansion cost. 


n N 

(<■>«>_!_.. . Dt,. \ . «t (w tt>' 

MinX E ("«-PM-»7+"*« •£(•«) ' w «« ) 

subject to 

i*sr- , -7' 


0^^" ) <^ 1 r + ^r" ) for i = 0, 1, . . .,nand*=l W 

I^^Qu+w^ fori = 0, •, . . .,nand*=l, . . ., N, 


2 W 0- 2 4 W / A) ) for£ = 



ft = 1 A- = 1 


^ Uir ,7) -^?"' ) ) fori=l, . . .,n 

ft = 

'it Zj ik ' 

fc = f -0,-1-1 

A related problem to the one just given above would be to maximize the production capacity/' (now a 
decision variable) for the iV-periods with a fixed budget of v dollars. This problem is stated as 


subject to 

n N 

£ 2 (vu'P((o u )'W^+v tit -P(m) • «&»>) = v 

1 = t = 1 


=£*(?">*££„ + u><?"< ) for i = 0, •, . . .,raandf=l N 

I\fri*zQit+wl%ri forZ = 0, . . .,nandt=l, . . .,N, 

2 (*fe tt) - 2 *fe? fc) ) fori = 

A = l 

/|f«) = 

2 (*£"<> -0fe-«>) fori=l, . . .,* 

A = l 


530 J. PRAWDA 

In terms of the NFUU G— [TV; A] the preceding two problems are seen to be an extension of the 
deterministic parametric budget problem solved by Fulkerson [13] and Hu [14, 16], to the multiperiod, 
stochastic case. 

The algorithm for solving the preceding two problems now follows: 

STEP (Initialization): Set/(s, y) = for all (x, y) in A. 

STEP 1 (Network Modification): Given f(x, y) for all {x, y) in A define a modified NFUU as 

a) If/(*, y) < q(x, y) then h[f(x, y)] = 0; 

b) If f(x, y) 5* q(x, y) then h[f(x, y)]=g(x, y); 

c) U0<f(x, y) =S q(x,y) then h\f{x, y)] = 0; 

d) Uf(x, y) > q{x, y) then h [f(x, y)] = — g(x, y); 

where g(x, y) is the capacity expansion unit cost for arc {x, y) in A and q(x, y) is the original capacity 
of arc (x, y) in A. Obviously properties (1) and (2) of r-nodes and (3) and (4) of c-nodes must hold. 

STEP 2 (Shortest Route): Send one unit flow from s to ( in the original network along a route 
corresponding to the shortest path just calculated in the modified network, that is, the path in the 
modified network whose incremental cost is minimum. Apply any shortest route algorithm [9, 11] 
with h( ) and h( ) as lengths. 

STEP 3 (Flow Augmentation and Stopping Rule): If the amount of flow at t is /' in the original 
network or the total amount of money used up is v, stop; otherwise return to Step 1 with the current 

It is obvious that*: 

wu or Ws it =f(x, y) — q(x, y) if f(x, y) > q(x, y) for some (x, y) in A, and 

wn or Ws u = if y(jc, y) =£ q{x, y) for some (x, y) in A. 


My sincere thanks to Professors Gordon P. Wright and Larry R. Arnold for their helpful com- 


[1] Arrow, K., S. Karlin, and H. Scarf, (eds.), Studies in the Mathematical Theory of Inventory and 
Production (Stanford University Press, Stanford Calif., 1958). 

[2] Beale, E. M. L., "An Algorithm for Solving the Transportation Problem when the Shipping Cost 
over each Route is Convex," Nav. Res. Log. Quart. 6, 43-56 (1959). 

[3] Bessler, S. A. and A. F. Veinott, Jr., "Optimal Policy for a Dynamic Multi-Echelon Inventory 
Model," Nav. Res. Log. Quart. 13, 335-389 (1966). 

[4] Busacker, R. G. and P. J. Gowen, "A Procedure for Determining a Family of Minimal Cost Net- 
work Flow Patterns," Technical Rept. No. 15, Operations Research Office, Johns Hopkins 
University, Baltimore, Md. (1961). 

*We assume that the function mapping the subscripts (i, t) i= 0. •. •. . ., n, t=l, . . . , /V with the arcs (jr. y) in A is 
known from the structure of the NFUU. 


[5] Clark, A. and H. Scarf, "Optimal Policies for a Multi-Echelon Inventory Problem," Management 

Science, 6, 475-490 (1960). 
[6] Clark, A. and H. Scarf, "Approximate Solutions to a Simple Multi-Echelon Inventory Problem," 

Chapter 5 in Studies in Applied Probability and Management Science by Arrow, Karlin, and 

Scarf (eds.), Stanford, University Press, Stanford Calif. (1962). 
[7] Connors, M. and W. Zangwill, "Cost Minimization in Networks with Discrete Stochastic Require- 
ments," Operations Research 19, 794-821 (1971). 
[8] Dantzig, G., Linear Programming and Extensions (Princeton University Press, Princeton, N.J., 

[9] Dreyfus, S. E., "An Appraisal of Some Shortest-Path Algorithms," Operations Research 17, 

395-412 (1969). 
[10] El-Agizy, M., "Dynamic Inventory Models and Stochastic Programming," IBM Journal of Research 

and Development (1969), pp. 351-356. 
[11] Ford, L. R. and D. R. Fulkerson, Flows in Networks (Princeton University Press, Princeton, N.J. 

[12] Ford, L. R. and D. R. Fulkerson, "Constructing Maximal Dynamic Flows from Static Flows," 

Operations Research, 6, 419-433 (1958). 
[13] Fulkerson, D. R., "Increasing the Capacity of a Network, The Parametric Budget Problem," 

Management Science 5, 472-483 (1959). 
[14] Hu, T. C, "Minimum Convex Cost Flows," Nav. Res. Log. Quart. 13, 1-19 (1966). 
[15] Hu, T. C, "Recent Advances in Network Flows," SIAM Review 10, 354-359 (1968). 
[16] Hu, T. C, Integer Programming and Network Flows (Addison Wesley Publishing Co., Reading, 

Mass., 1969). 
[17] Iglehart, D., "Recent Results in Inventory Theory," J. Indust. Eng. 18, 48-51 (1967). 
[18] Iglehart, D., "Recent Developments in Stochastic Inventory Models," Invited Paper at the Na- 
tional Meeting of ORSA, June 19, 1969, Denver, Colorado. 
[19] Prawda, J. and G. P. Wright, "On Some Applications of Network Flows Under Uncertainty," 

Proceedings of the International IEEE Conference on Systems, Networks, and Computers, 

Oaxtepec, Morelos, Mexico (Jan. 19-21, 1971). 
[20] Scarf, H. E., D. Gilford, M. Shelley, Multistage Inventory Models and Techniques (Stanford Uni- 
versity Press, Stanford, Calif., 1963). 
[21] Veinott, A., Jr., "Optimal Policy for Multiproduct, Dynamic, Nonstationary Inventory Problem," 

Management Science 12, 206-222 (1965). 
[22] Veinott, A., Jr., "The Status of Mathematical Inventory Theory," Management Science 12, 

745-777 (1966). 
[23] Zangwill, W., "A Deterministic Multiproduct, Multi-facility, Production and Inventory Model," 

Operations Research 14, 486-507 (1966). 
[24] Zangwill, W., "The Shortest Route Problem under Either Concave or Convex Costs," Presented 

at the 12th Annual Operations Research Society of America Meeting, Santa Monica, California 

[25] Zangwill, W., Nonlinear Programming, A Unified Approach (Prentice Hall, Inc., Englewood 

Cliffs, N.J., 1969). 


Hamdy A. Taha 
University of Arkansas 


A general algorithm is developed for minimizing a well defined concave function over 
a convex polyhedron. The algorithm is basically a branch and bound technique which 
utilizes a special cutting plane procedure to' identify the global minimum extreme point 
of the convex polyhedron. The indicated cutting plane method is based on Glover's general 
theory for constructing legitimate cuts to identify certain points in a given convex poly- 
hedron. It is shown that the crux of the algorithm is the development of a linear underesti- 
mator for the constrained concave objective function. Applications of the algorithm to the 
fixed-charge problem, the separable concave programming problem, the quadratic problem, 
and the 0~1 mixed integer problem are discussed. Computer results for the fixed-charge 
problem are also presented. 


Consider the problem 

(1) min/U), 

where x= (*i, x 2 , . . ., x n ) and Q— {xeE n \ Ax= b, x^O}. The function of j\x) is assumed to be 
concave and well defined over the convex polyhedron Q. It is also assumed that the contrained minimum 
of f(x) is finite. 

The optimum solution to (1) is characterized by its occurence at an extreme point of Q. However, 
the principal difficulty is that a local minimum is not necessarily global. 

A method for solving (1) was proposed by Hoang Tuy [12], but with the additional requirement that 
f(x) be concave over all xeE n . Tuy's algorithm is started by identifying a local minimum point, x, of Q. 
A hyperplane cut (called Tuy's cut) is then determined and augmented to the problem so that all 
feasible (extreme) points in Q having a worse value than f(x) are excluded. Informally, Tuy's cut is 
generally defined by a hyperplane passing through the end points of the extended halflines emanating 
from the current local minimum such that the associated values of f(x) at these end points is equal to 
f{x). It is clear that/(jc) is an upper bound on the optimal objective value and that any extreme point 
xeQ having f{x) 3= f{x) cannot be promising. The process is then continued by searching for a local 
minimum of the new solution space resulting from the application of the last Tuy's cut. If no new local 
minima exist, the algorithm is terminated with the last local minimum being the global optimum. 
Tuy provides no convergence proof for the algorithm. 

This paper presents a new algorithm for solving (1). The algorithm is basically a branch-and-bound 
method which utilizes a special cutting plane procedure to identify the global extreme point of Q. 
The main difference between this work and Tuy's is that the cuts are generated solely from the geom- 


534 H. A. TAHA 

etry of the convex polyhedron Q. Also, the identification of the candidate extreme points of Q necessi- 
tates defining a linear function which underestimates f{x). The linearity restriction is important 
since, as will be seen later, it reduces the problem to solving a series of linear programs. 

Although a method is given for developing a linear underestimator for the general case, illustra- 
tions for developing more efficient (or tighter) underestimators are also provided for important concave 
minimization problems. 

In section II, the generalized branch-and-bound algorithm is presented and its relationship to 
work by other authors is discussed. Section III introduces the cutting plane method associated with 
the algorithm. Section IV develops a general linear underestimator for/(*) and shows how "tighter" 
underestimators are developed for the fixed-charge problem, the separable programming problem, 
and the 0-1 mixed integer linear problem. Finally, section V illustrates the computational efficiency 
of the proposed algorithm as applied to the fixed-charge problem. 


The general idea of the algorithm is explained as follows. Let l(x) be a linear underestimator of 
f{x) over Q; that is, 

(2) /(*)*£/(*), xeQ, 

then it is clear that 

(2') min {l(x)\xeQ}^ min {/ (x) \xeQ}- 

X X 

This means that by starting with the extreme point x° satisfying min{/(;t) |jce()}, /= /(jc°) is a lower 
bound on the optimum objective value of (1), while, from (2), an obvious upper bound is given by 
f—f(x°). Now consider jc'(# x°), an adjacent extreme point to x° such that /(*') yields the smallest 
l{x) among all the adjacent extreme point of x°. It is clear that only those adjacent point having/ 
=s l{x) *£/ need be considered in determining x l . The point x 1 is then said to be the next ranked ex- 
treme point. (The exact details of the general (cutting plane) procedure for determining the next ranked 
extreme points will be presented in section III.) Now, the new lower bound is f— lix 1 ). The upper 
bound/is changed to/(ac') only iff(x l ) is smaller than the current upper bound /=/(x°). 

In general, suppose E*' 1 — {x°, x 1 , . . ., x l ~ 1 } is the set of (nonredundant) extreme points thus 
far ranked. Then x\ the next ranked extreme point, is selected as the adjacent extreme point to one of 
the elements in E*~ l such that l(x { ) is again the smallest among all such adjacent extreme points and 
provided x' C\ E'' 1 = (f>, that is, x { is nonredundant with respect to E*' 1 . The current lower bound is 
now given by/= /(*'), but the upper bound is changed tof(x { ) only if this quantity is smaller than 
the best available upper bound/. 

The termination of the procedure is effected at x k if f = l(x k ) 3 s / with the extreme point as- 
sociated with /being the optimum. This follows since from (2), 

f(x)^l(x k )^f, xeQ*-E\ 

where Q* is the set of extreme points of Q. This condition shows that all the remaining extreme points 
(Q* — E k ) can only yield worse objective values than/, and are thus nonpromising. 


The above discussion can be summarized in an algorithmic form as follows: 
STEP 0: Solve the linear program: 

min {l(x)\xeQ} 


and let x° be the optimum extreme point. Define f(x°) = l(x°) as the lower bound on the optimum 
objective value of (1). Let x* = x°. Then f(x*) is an upper bound. Set i — 0, then go to Step 1. 

STEP 1: The current upper and lower bounds are given by f{x*) &n&f{x'), respectively. Let 
x i+1 be the next ranked extreme point of Q and set the new lower bound f(x' +i ) = l(x' +l ). Go to Step 2. 

STEP 2: If l(x i+1 ) 2* /(**), stop; x* is optimum. If f(x i+l ) < /(**) , set x* = x i+1 ,f(x*) =f(x i+1 ). 
Otherwise, the upper bound remains unchanged. Set i = i + 1 and go to Step 1. 

The general idea of the above algorithm was first proposed by Katta Murty [7] for solving the 
fixed-charge problem. Murty also indicated [7, Corollary 1] that for f{x) — D(x) + z{x), where z(x) 
is linear and D(x) is concave, if l(x) is taken equal to z(x), the algorithm is equally applicable. However, 
it is clear that Murty's corollary is true only ifz(*) =S z(x) + D(x). This obviously is not valid, in general. 
Later, Cabot and Francis [3] utilized the exact algorithm to solve the case where D(x) is a negative 
(semi)definite quadratic form. (See section IV.) The Cabot-Francis paper, however, presents the 
details of Murty's algorithm in a more explicit manner. 

The ranking procedure of Step 1, as advanced by Murty, determines the adjacent extreme points 
to each element (basis) in E' as the (new) basic solutions in which one of the current (eligible) non- 
basic variables is made basic. This requires carrying out a single pivot operation as in the simplex 
method. The major drawbacks of Murty's procedure is that the number of generated adjacent extreme 
points may become very large to the extent of taxing the computer memory. Moreover, because the 
same (adjacent) extreme point may be generated from more than one element in E', a procedure is 
needed to avoid storing redundant points. The extensive experimentation by this author shows that 
Murty's algorithm, as applied to the zero-one problem, generally yields very discouraging results 
(See [10]). The complex bookkeeping procedures required to economize the utilization of the computer 
memory and to minimize redundancy shows distinctly that the algorithm can very easily reach an 
unmanageable state. 

This paper differs from the work of Murty in two aspects: 

(i) It presents a general algorithm which solves any problem of type (1). This is in contrast with 
Murty's (or Cabot and Francis') work which leaves the impression that it can handle specialized con- 
cave problems only. 

(ii) It develops a new procedure for the details of Step 1 of the algorithm which improves on the 
drawback of Murty's ranking scheme. The new procedure utilizes a cutting plane technique which 
uses the "convexity cuts" recently developed by Glover [5]. 

It must be noted that, by using Murty's ranking procedure, the requirement that the underesti- 
mator l{x) be linear is needed only in Step 0. This follows since the theory of linear programming auto- 
matically allows the determination of the proper extreme point, x°. Clearly, the linearity assumption 
is not needed in the ranking procedure of Step 1. This is in contrast with the new ranking procedure, 
to be introduced in the next section, where the linearity of the underestimator is a mandatory require- 
ment. This follows since the ranked extreme points are determined by applying the dual simplex method 
of linear programming. 

536 H. A. TAHA 


Informally, the idea of the new ranking scheme is explained as follows. Start with x° obtained at 
Step 0. Then define a cut that eliminates x° only from among all the extreme points of Q. The hyperplane 
associated with the cut is determined to pass through the adjacent extreme points of* . Now, augment- 
ing the linear programming problem with the cut and applying the dual simplex method, the resulting 
optimum feasible solution yields the next ranked extreme point. A new cut can now be generated from 
the adjacent extreme points by using the new solution space. The process is repeated as necessary. 

The above procedure is tailored after a recent development by Glover [5] who lays a general 
theory for constructing legitimate cuts which can be used systematically to determine certain points 
in a given convex polyhedron. A typical illustration is the convex polyhedron Q with its extreme points 
representing the set of points to be identified. Glover's theory actually generalizes earlier ideas by 
Young [13] and Balas [1] where they developed legitimate cuts for the integer linear programming 

To formalize the above discussion, let the current basic solution be defined by the set of equations 

(3) y t = b i0 -^ b u tj, ieM, y„ tj 2* 0, 

where yt and tj are the basic and nonbasic variables, respectively. The sets M and N define the indices 
of the basic and nonbasic variables. The cut referred to above is now described based on Glover's 

Glover's Convexity Cut Lemma [5]: Let S be a set of points in the convex polyhedron Q. If R is a 
convex set whose interior contains no point in S, and if y;=6,o, i*M (possibly a boundary point of/?) 
has a deleted feasible neighborhood which lies in the interior of/?, then for any constant t* > 0,jeN, 
such that 

yt= b i0 — bijt*eR, V ieM, 

the convexity cut t 

(4) ^ 'il'l * l 

excludes the extreme point yi=6io, ieM, but never any point in 5. 

The application of the above lemma to Step 1 of the algorithm is straightforward. Here the set S 
consists of the unranked extreme points of the current solution space. The point y; = bio, ieM, takes 
the place of the current "ranked" extreme point, and the set R is represented by the convex poly- 
hedron describing the current solution space. 

The determination of the constants t * in (4) follows directly from the theory of the simplex method; 

that is, 

t Because of the convexity requirement stipulated on the Set R, Glover coins the suggestive name "convexity cut." 


bij > 

„• bio 
min _ii 

im bij 

all bij^O. 

Clearly, t* is strictly positive if the current solution is nondegenerate, that is, 6jo > 0. When 
bto= for at least one icM, then it is possible that t* = and the convexity cut (4) becomes undefined. 

In order to overcome the above difficulty resulting from a degenerate situation, we use the follow- 
ing procedure due to Balas [l].t 

Degeneracy occurs when an extreme point is "overdetermined," that is, when the current solution 
point has more than n hyperplanes associated with it, where n is the total number of variables. Balas 
[1] proves that by dropping each constraint for which the associated basic variable is equal to zero, 
the resulting convex polytope necessarily associates n distinct edges with the current solution vertex. 
Under this condition, the values off* are readily determined. Of course, when the cut is added, all 
the deleted constraints must be activated before the problem is reoptimized; unless such constraints 
are proved to be redundant with respect to R in which case they can be dropped completely. 

There are two important points which must be considered in association with the degeneracy prob- 
lem. These difficulties do not arise in Balas' case mainly because the sets/? andS in his problem remain 
unaffected by the deletion of the constraints associated with the zero basic variables. This obviously 
is not the case in our situation. 

(i) Let C be the degenerate cone associated with the current solution vertex, X, and define L as 
the polytope obtained from the current solution space of the problem by deleting the halfspaces asso- 
ciated with C. Further, define C as the nondegenerate cone associated with X which is obtained from 
C by deleting all the constraints satisfying Balas' condition. 

Since C C C , then the cut obtained from the adjacent extreme points resulting from the inter- 
section of C with L cannot be stronger than its equivalence when C replaces C . This means that 
the new cut cannot eliminate any of the extreme points of R(=C D L) which have not been tested for 
optimality. Subsequently, the cut obtained by using C is legitimate with respect to R. 

(ii) The cut obtained by using C will most likely create new extreme points which are not part 
of the vertices of the original solution space. The question then arises as to the possibility of the op- 
timal solution being "trapped" at one of these vertices. This point is refuted as follows: 

By the convexity cut lemma, such extreme points (when they occur) must lie on the halfline 

(5 ) y, = b i0 - bijtj , < t j < t* , ieM. 

If (5) is an edge (or segment thereof) of the original solution space Q (defined in (1)), then the new extreme 
point is actually a nonvertex of Q. Consequently, it cannot yield an improved solution point as this leads 
to contradiction. A similar argument applies if (5) is a new edge resulting from the application of 
previous cuts. 

tit must be noted that Murty's procedure overcomes the degeneracy problem by enumerating all the basic solutions associ- 
ated with the current extreme point. From the computational point of view, this has proved to be very time consuming (see 

538 H. A. TAHA 

It is important to notice that the effect of degeneracy goes beyond simple inconvenience in compu- 
tation. Essentially, the creation of new extreme points must reduce the efficiency of the proposed 
method since it may be necessary to test these points for optimality (see the numerical example in sec- 
tion VI for an illustration). Consequently, it seems important that serious consideration must be given to 
minimizing the effect of degeneracy. The work of Thompson, Tonge, and Zionts [11] provides ways for 
eliminating degeneracy in certain situations (as illustrated by the numerical example in section IV). 
However, there does not yet exist a general method for handling all degeneracy situations. 


In this section we show how a linear underestimator l{x) can be developed for f(x) in the general 
case. Since the efficiency of the proposed algorithm should depend on the selection of the under- 
estimator, illustrations showing how tighter underestimators can be developed for an important class 
of concave minimization problems. This includes the fixed-charge problem, the separable programming 
problem, the quadratic problem, and the 0-1 mixed integer problem. 

(i) General Underestimator l(x): 

From the properties of the concave functions, a tangent hyperplane tof(x) at x [assume xeQ, where 
Q is the convex polyhedron defined in (1)] overestimates f(x). Consequently, it appears plausible that 
we can make use of a tangent hyperplane to g(x) =—f(x) (with modifications) to underestimate /(*). 
Let t a (x) be a tangent hyperplane to g(x) at a given point.t Clearly, for any x, 

(6) t!l (x)^g(x). 

Now a transition from g(x) to f(x) can be made if g(x) *£/(*)• Unfortunately, this is not true in 
general. However, if the values of x are restricted to those in Q, then the transition can be achieved 
as follows: 

PROPOSITION: Let M 3* be a real number, then there must exist a value of M < °° such that 

(7) -M + t g (x)^f(x), xeQ. 

In this case/(x) = — M+t„(x). 

PROOF: We need only show that —M + g(x) ^f(x) for all xeQ. The minimum (maximum) 
value o(f(x)(g(x)) occurs at an extreme point of Q. If minf(x) 2 s then maxg(x) ^ and obviously 

xtQ xtQ 

the desired result is achieved for M = 0. Now, suppose m'mf(x) < 0, then max g(x) > 0. By assump- 

X€« XtQ 

tion, f(x) possesses a finite minimum over Q. Thus M can be selected such that Ms* |min/(x)|. 


Since, by symmetry, |min/(a;)| = max g(x) it follows that, 

xtQ xtQ 

-M + g(x) **-M+maxg(x) =£ =s M+ min/(x) *£ Af +/(*), xeQ. 

xtQ xtQ 

Now, since M 2s |min/(*)| can be taken arbitrarily large, then letting M 2* 2|min/(jt)|, the desired 

xtQ xtQ 

conclusion follows immediately. 

t We further assume that the tangent hyperplane is determined at x satisfying Vg(jE) ^ 0, where Vg(*) is the gradient 
vector of g(x) at x. This will ensure that the resulting linear underestimator is not trivial. 


The above proposition actually implies that a linear underestimator for/(*), xeQ, can be taken as 

/(*)=- Y mjXj-M, 

where rtij are positive constants and M ^ | min/(*) | . If mmf{x) ^ 0, then M can be taken equal to zero. 

xtQ xtQ 

Notice that since the lower bound on M is obviously not known a priori, one must rely on some prac- 
tical estimate to determine a numerical value for M . Although any values of m, > can be utilized, fur- 
ther research is needed to determine the best set of values providing the tightest linear underestimator. 

(ii) Fixed-Charge Problem: 

In the fixed-charge problem, f(x) is defined by 

f(x)=^CjX j + ^K j 8(x j ), N={1, . . .,«}, 

jfN jtN 

where 8(xj) — if xj = 0, and B(xj) — 1 if Xj > 0. The coefficients c, and Kj are real numbers with Kj > 
for ally. It can be proved that f{x) is a concave function which is continuous everywhere except at 
x = 0. Now, since Kj > by assumption, it follows that 

2 CM < 2 CjXj + 2 Kjd (xj) , xj s* 0. 

jtN jtN jtN 

This shows that the linear underestimator can be taken as 

(8) l(x)=^CjXj. 


Notice l(x) is valid for any Xj ^ 0. 

The application of the above estimator will be illustrated numerically in the next section. Notice that 
the same idea can be utilized to solve certain problems that often arise in inventory theory. A typical 
example is the finite horizon, multiple-item model in which price breaks (or quantity discounts) are 
allowed in the ordering function. This typically results in a piecewise-linear concave function. In this 
case, the linear segments representing the smallest per unit ordering cost can be used to determine l{x) . 

(iii) Separable Programming Problem: 



where fj(xj) is a concave and well defined function. Suppose now that the feasible range for each 
Xj as defined by the solution space Q is given by aj ^ Xj ^ bj, where a, and bj are known constants. 
Let lj(x) = ajXj + [ij be the straight line joining the two points (a,j,fj(a.j)) and (bj,fj(bj)). Since fj(xj) 
is concave, it follows by definition that 

h(xj)^fj(xj), xjcQ. 

540 H A - TAHA 


(9) l(x) = f i l j (x j ). 

It is noted that the fixed-charge problem discussed in (ii) satisfies the condition for a separable 
concave problem. Consequently, the above linear underestimator can also be used with the fixed- 
charge problem. Notice, however, that the present underestimator is tighter than that defined in 
(ii). This follows since it is defined for constrained values of x only as compared with x 2 s in the 
fixed-charge problem. 

(iv) Quadratic Minimization Problem: 



where z(x) is linear and D(x) is a negative (semi)definite quadratic form. If D{x) = xBx T , then B is 
negative (semi)definite. The linear underestimator in this case was developed by Gilmore [4] and by 
Lawler [6] and was subsequently utilized by Cabot and Francis [3] in connection with Murty's algorithm. 
Let bj be they'th column of B. Then D(x) can be written as 

D(*)=y (xbj)xj. 

Define Uj= min{xbj}. Hence 


(10) /(*)=z(*)+Y Ujxj^z(x)+D(x). 


Notice that since Xj 3* 0, for ally, then, for any finite Wj =£ Uj, z(x) + V WjXj still provides a legitimate 


underestimator. This result can sometimes be used advantageously to avoid solving n linear programs. 
An application of this situation is given below. 

(v) Zero-One Mixed Integer Problem: 

In this problem 

/(*) = J cjxj, xj =(0,1) for yeA/ 1 C N. 


The function f{x) can be written equivalently as, 

/(*) = 2 <W + E (ci+M(l-Xj) )xj, M > and very large. 

jcN-Nl jtNl 

jfN jeNi jtNi 


where Xj > 0, jeN, and Xj =£ 1 , jeN 1 . The expression M ( 1 — Xj) assigns a very high penalty to Xj for < 
Xj < l,jeN\ thus allowing it to take binary values only. The mixed integer objective function has been 
equivalently converted into a quadratic function in which the quadratic form A/^£ (— x 2 .) is clearly 

)tN\ J 

negative definite. Notice that all the variables in the new form are continuous. 

The above equivalence relationship was developed by Raghavachari [8] and independently by 
Taha [10] in an effort to secure a simpler formulation for the mixed 0-1 integer problem. 

The transformed f(x) is exactly in the same form as the function in (iv). Thus, using the method 
in (iv), Uj,jeN l , is defined by 

(11) Uj= min { — Mxj\xj€Q, «£ Xj *£ 1} 

3= min { -Mxj\0 *£ xj =£ 1} = -M. 
x i 

Thus, taking Uj = — M, it follows from the development in (iv) that l(x)= V CjXj. This shows that 

l(x) can be taken asf(x) after removing the condition xj= (0, l),jeN\ Notice that if Uj is determined 
from the exact linear program in (11), the main difference would be that =£ x- 3 < 1 for some jeN 1 , 
that is, Xj = 0. This means that the new l(x) will be the same as above except that the indicated Xj 
are set equal to zero. 

The above result can also be derived on intuitive basis. Since for the 0-1 mixed integer prob- 
lem the optimum must occur at an extreme point, the integrality condition can be replaced by the 

continuous range s£ Xj =£ 1. It then follows that] ]£ cpc, 1 «£ x, ^ 1 , yWV 1 [must underestimate 

I ^ c jXj\xj= (0, 1), jeN 1 \ since the former is less restrictive. This, incidentally, means that the trans- 

formation of/(x) given above does not yield any privileged information and hence is trivial. 

Notice that by using the continuous range 0=£;tj=£l, jeN 1 , the resulting objective function be- 
comes linear in Xj over its feasible values. Thus the new objective function may be considered concave 
over the feasible space and the general algorithm in section II becomes applicable. In this case the 
upper bound /is defined equal to °° for any extreme point not satisfying Xj= (0, I), jeN 1 - The impor- 
tant point, however, is that the cut as defined in section III is uniformly weaker than its equivalence 
as developed by Balas [1]. On the other hand, the determination and use of Balas' stronger cut re- 
quires more complex computation as compared with ours. Consequently, the real merit of either cut 
can only be checked through computational experimentation. 


This section illustrates the efficiency of the proposed algorithm by applying it to the fixed-charge 
problem. This special case is selected primarily because of its practical interest. In addition, the avail- 
ability in the literature of computational results for other fixed-charge methods allows a more meaningful 
evaluation of the proposed algorithm. 

In order to clarify the details of the algorithm, especially those associated with the degeneracy 
problem, we first introduce a numerical example. This will be followed by a presentation of the com- 
puter results as applied to randomly generated problems. 



subject to 



minimize f{x) =<f> i (x t ) +(^2(^2) 

2xi + x 2 + Si = 4 

xi + x-i + Si = 3 

*£ x 2 ^ 5/2 

Xi, Si, S2, ^ 0, 

<M*i) = 

4*z(xi) — 

0, *, = 

■4xi + l, xi >0 

0, x 2 = 

1-3*2+1/2, * 2 >0. 

Thus, /(*) = — 4xi — 3*2. Using Dantzig's technique to accommodate the upper bound on x%, Table I 
gives the solution specifying x°. A graphical display of the solution is given in Figure 1. 

Table I 


s 2 





X\ — 

X-z — 





x°=(l,2); Point© 

f(x°) =- 10+ (1 + y,)=-8V» 

Cut #1 is now developed. (Notice that the determination of the constants (t*) of the cut must be 
based on Dantzig's upper bounding technique.) Thus, 

S*=min{ 1/1, ^p, 00 I = 1/2 

S*=min{2/2, », oo| = l 

and the cut is given by 

— + — ^1 
V2 1 






I 2 3 *l 

FIGURE 1. Solution of the numerical example. 

Expressed in terms of x\ and x-z, the cut is 

(Cut #1) 

5*, + 3x 2 =£l0; 

We denote the slack as S3. 

Table II yields x l as a result of augmenting Table I by Cut #1 and reoptimizing using the dual 
simplex method for upper bounded variables. 

Table II 

s 2 

s 3 


-9 l /2 



x t = 




x% = 








Jt'=(V2. 5 /2); Point® 

/(*')= -9V2 

7(*')=-9V2+(l 1 /2)=-8>/(x*) 
f(x*) =f(x°) 

Notice that in Table II, Xi is basic at its upper bound. This means that the current solution is degen- 
erate. Using Balas' condition which, in this case, calls for ignoring the equations involving basic vari- 



ables at upper bound or zero level, it is clear that the Adequation must be disregarded in developing 
cut #2. Thus, 

S*=min l~, oo, oo las 1 

S* — min 

fV2 1 

i 00 00^ = 

Ivi' ' J 

This yields a new cut which when expressed in terms of x\ and x% is given by 

(Cut #2) 

6x, + 4jt2 + S 4 =12, S 4 2*0. 

Table III gives the new solution after Cut #2 is effected. Notice that x% — 5/2 — x' r Notice also that 
since S3 is associated with a previous cut and since it is basic, its corresponding equation can be dropped 
in future tableaus. 

Table HI 

s 4 


-8 5 /e 



Xi = 




s 2 = 








s 3 = 




X*=(V3, */a); Point (2) 

/(.t 2 )=-8 5 /s 
7(jt 2 )=-8 5 /6+lV2 = - 

•7V 3 >/(**) 

Cut #3 is now generated from Table III. This gives 

(Cut #3) 

30x, + 24*2 «S 60. 

The application of this cut will yield point (3) with x 3 = (2, 0) and/(* 3 ) = — 8. Since/(x 3 ) >/(**), the 
process terminates. Thus x* = x° is the optimum solution. 

Notice the effect of degeneracy at (7). Point (]) is (over) determined by the three lines jc 2 = 5/2, 
*i + *2 = 3, and 6xi+4x 2 = 12. Balas' condition drops *2 = 5/2. The cone C as introduced in section 
III, is then defined by the halfplanes x x + x 2 ^ 3 and dx x +4x 2 ^ 12, which yields Cut #2. The optimum 
point (2) is a new extreme point which does not belong to the original solution space. 

It is remarked that if the redundant constraint X\ + x-i =£ 3 is eliminated instead of x% =£ 5/2, then 
Cut #2 would have been stronger as it would pass through points ©and®. Stanley Zionts, in a pri- 
vate communication to the author, shows that by using the results in [11 J, this specific degeneracy 
situation can be avoided. The idea is as follows: Prior to constructing a cut constraint, if there is any 
degeneracy write the degenerate constraint so that the right-hand side element is zero. (In [11], methods 
for identifying redundant, (and of course redundant degenerate) constraints are provided.) Applying 
this to Table II, x-i is replaced by 5/2 — x' . In order for the redundant constraint to be implied in "defi- 


nitional" form, x' 2 must now be made nonbasic with S 2 being the new basic variable. This yields Table 
IV (Table II revised). 

Table IV 






V 2 



s 2 







Notice that the S2-row is redundant now and may be dropped from the tableau. But more impor- 
tantly, the generated cut is 

^- + —3*1 

5/2^5/2 ' 
which now passes through points Q) and®, thus bypassing the extra point (5) and its associated cut. 

Computer Results 

The testing of the algorithm as applied to the fixed charge problem is designed to check the effect 
of the size of the problem and the magnitude of the fixed charge on the speed of computation. Random 
problems of the type 

max{^ (c j x j + Kj8(xj))\^a i jXj^bi,Xj^0,i=l, ... to} 

are generated with their coefficients lying in the ranges 

0«£c, ^999 

*£ Kj ^ 160 

-20 *£ a i} *£ 100 

O^bt ^200. 

The sizes of the generated problem are given by (to X n) = (5 X 20), (5 X 30) , (10 X 20) , and (15 X 30). 
In order to test the effect of the fixed charge, the same problems are used again with K, replaced by 
2Kj and 3£,-, respectively. No special structure is specified for the problems and the density of the matrix 
d= || a*/ 1 is at least 97 percent. 

The algorithm is coded in FORTRAN IV for the IBM 360/50. The results are summarized in Table I. 

One of the basic difficulties we encountered in coding the algorithm was the control of machine 
round-off error. This is important since a zero variable may be rounded to a positive value, thus affecting 



TABLE V. Summary of Computation 

(Time in seconds) 


(mXn)= (5X20) 


(mXn) = (10X20) 

(mXn)= (15X30) 


2K } 

ZK } 



3K } 





















































































the bounds directly. The problem was overcome by using double-precision computation as well as 
appropriate tolerances. Also, checks were implemented in the code to detect the accumulation of ma- 
chine round-off error. For example, an important check is to test whether at a given iteration the number 
of positive variables among the original variables exceeds the number of original constrainst. It must 
be remarked that the five problems in Table V were selected from among 20 test problems as the ones 
yielding the least amount of "disorder" from the viewpoint of machine round-off error. The remaining 
problems were excluded by the checks in the code because they indicated uncontrollable round-off 
error. It is felt, however, that a professional programmer should be able to develop a more efficient and 
accurate code than the one written by the author. 

Although the results in Table I are generally compatible with what one may expect; that is, the 
average computation time increases with the increase in the fixed charges, the individual problems 
are exhibiting peculiar behavior which needs explanation. For example, problem 3— (10 X 20) requires 
1.5 seconds for Kj, 192 seconds for 2Kj, and again 192 seconds for 3Kj. This result can be justified as 

The termination of the algorithm occurs at the extreme point x' when l(x') 2* / (x*). It is obvious 
that the computation time of the problem is primarily a function of the number of extreme points which 
are ranked before termination occurs. Thus, two problems having the same solution space, will require 
the same computation time if they terminate at the same x'. Notice that l(x) is dependent only on the 
linear terms of the objective function and that its value at an extreme point is not dependent on the fixed 
charges, while /(**) is directly dependent on the fixed charges. Consequently, if l{x l ) —/(**) for 
2Kj is large enough to accommodate an increase in the fixed charges to 3Kj, termination still occurs at 
x' and the same computation time is consumed. Similarly, if l(x') —f(x*) for Kj is too small, an in- 
crease in the fixed charges to 2Kj may necessitate further ranking of new extreme points before ter- 
mination is effected. 

The results in Table I also show that the computation time increases more appreciably with the 
increase in the number of constraints rather than with the number of variables. These results differ 
from those associated with cutting plane algorithms in integer programming where the number or vari- 
ables is the main factor affecting the computation time. The reason for this appears to be that our 


algorithm depends more directly on the number of extreme points of the solution space which is a 
function of both the number of constraints and the number of variables. 

For the sake of comparing our algorithm with other exact methods for the fixed charge problem, 
we only came across two algorithms by Bod [2] and Steinberg [9]. The two methods are of the branch 
and bound type. Bod's method utilizes what may be termed as a partial enumeration technique for test- 
ing all the extreme points (basic feasible solutions) of the convex polyhedron. The effective use of bounds 
on the objective value excludes most of the nonpromising extreme points. Steinberg's method, on the 
other hand, initiates two problems at each node according to whether the variable Xj is zero or positive. 
Bounds on the objective value are also used to effect the proper termination of the algorithm. 

Bod does not present computer results for his algorithm. But Steinberg tests two sets of problem 
with sizes (5 X 10) and (15 X 30) on the IBM 360/50. The average computation times per problem for 
the two sets are 10 sec and 21.1 min, respectively. This is far inferior to the average computation time 
obtained by our algorithm; esi> dally that Steinberg's algorithm can easily tax the computer memory. 
He reports that a set of 15 blems, with size (5 X 10) each, requires an average of 32 nodes while 
those with size (15 X 30) each require an average of 1,208 nodes. This shows that the number of nodes 
can become very large even for problems, with modest sizes. The problem is not present in our algorithm 
since, as in any cutting plane algorithm, the size of the matrix A at any iteration cannot exceed 
(m + n)xn. 

We must remark also that, contrary to our algorithm, Steinberg's algorithm becomes slower as the 
magnitude of the fixed charge decreases. He utilizes the ranges =S Cj =£ 20 and s£ Kj =£ 999 for his 
test problems, but does not study the effect of variations in Kj on the speed of computation. 


The algorithm presented in this paper is general in the sense that it can handle any concave minimi- 
zation problem over a convex polyhedron. If the computer results of the algorithm as applied to the fixed 
charge problem are at all indicative of its efficiency, it would appear that the algorithm can actually 
be used to solve practical problems. Further research is still needed, however, to develop the tightest 
linear underestimator for f(x). Also, since degeneracy is a pronounced problem in our algorithm, a 
general method is needed for treating the degenerate case without weakening the resulting cuts. This 
should improve the efficiency of computation considerably. 


The author wishes to thank Professor Stanley Zionts, State University of New York at Buffalo, for 
his helpful comments. 


[1] Balas, E., "Intersection Cuts — A New Type of Cutting Plane for Integer Programming," Opera- 
tions Research 19, 19-39 (1971). 

[2] Bod, P., "Solution of a Fixed Charge Linear Programming Problem," Proceedings of Princeton 
Symposium on Mathematical Programming (Princeton University Press, Princeton, New Jersey, 
1970), pp. 367-375. 

[3] Cabot, A. V. and R. L. Francis, "Solving Certain Nonconvex Quadratic Minimization Problems 
by Ranking the Extreme Points," Operations Research 18, 82-86 (1970). 

548 H. A. TAHA 

[4] Gilmore, P. C, "Optimal and Suboptimal Algorithms for the Quadratic Assignment Problem," 

SIAM Journal 10, 305-313 (1962). 
[5] Glover, F., "Convexity Cuts and Cut Search," Operations Research, 21, 123-134 (1973). 
[6] Lawler, E. L., "The Quadratic Assignment Problem," Management Science 9, 586-599 (1963). 
[7] Murty, K. G., "Solving the Fixed Charge Problem by Ranking the Extreme Points," Operations 

Research 16, 268-279 (1968). 
[8] Raghavachari, M., "On the Zero-One Integer Programming Problem," Operations Research 17, 

680-684 (1969). 
[9] Steinberg, D. I., "The Fixed Charge Problem," Nav. Res. Log. Quart. 1 7, 217-235 (1970). 
[10] Taha, H., "On the Solution of Zero-One Linear Programs by Ranking the Extreme Points," 

Technical Rept. No. 71-2, University of Arkansas (Feb. 1971) revised May 1972. 
[11] Thompson, G. L., F. Tange, and S. Zionts, "Techniques for Removing Nonbinding Constraints and 
Extraneous Variables from Linear Programming Problems," Management Science 12, 588-608 
[12] Tuy, H., "Concave Programming Under Linear Constraints," Soviet Math 5, 1437-1440 (1964). 
[13] Young, R. D., "New Cuts for a Special Class of 0-1 Integer Programs," Research Report, Rice 
University, Texas (Nov. 1968). 



Laurence Lee George 

University of Louisville 
Louisville, Kentucky 


Avinash C. Agrawal 

University of British Columbia 
Vancouver, B.C., Canada 


The maximum likelihood estimator of the service distribution function of an A//G/°° 
service system is obtained based on output time observations. This estimator is useful when 
observation of the service time of each customer could introduce bias or may be impossible. 
The maximum likelihood estimator is compared to the estimator proposed by Mark Brown, 
[2]. Relative to each other, Brown's estimator is useful in light traffic while the maximum 
likelihood estimator is applicable in heavy traffic. Both estimators are compared to the em- 
pirical distribution function based on a sample of service times and are found to have draw- 
backs although each estimator may have applications in special circumstances. 


Suppose customers arrive at a service system at instants T u T 2 , . . . T n , where {T,,} is a sta- 
tionary Poisson process with rate parameter A customers per unit time. Each customer is served upon 
arrival and there are sufficient servers. Service times are independently and identically distributed with 
some unknown distribution function G(t), f 2= 0. These conditions describe the A//G7°° service system. 
They are often found in self service systems. In design of such systems it may be necessary to determine 
the unknown service distribution. Direct observations on the service time for each customer that enters 
the system may not be possible because of the economic constraints or because of other factors such 
as introduction of unavoidable bias, or simply, the actual behaviour of the customers while in the system 
is unobservable. An example of the first case may be cars entering a freeway where the distribution 
function of the time spent by cars on the freeway is to be estimated and tracing each car individually 
to find the time spent on the freeway may be extremely expensive. A similar situation may also exist 
in any store where it may not be possible to follow each customer through the store. Another effect of 
making direct observations on service time is to bias observations as customers may become conscious 
of being observed. It is for these reasons that direct observations on service time may not be possible. 
The service distribution, therefore, is hidden and estimation must be based on information other than a 
sample of service times. 

*This research was supported in part by the Defence Research Board of Canada Grant Number 9701 -25, when the authors 
were at the University of British Columbia. 




Mirasol [5] shows that the output of an Af/G/oo service system is a nonstationary Poisson process 

(2.1) Pr (number of departures in (0, t) = n\ system initially empty) 

e-Wf.(KS> C{x)dx )« 
= — , n = 0,l,. . ., 

where G(-) is the common service time distribution function and A is the Poisson arrival rate. The 
intensity function of this time dependent process, \-G(t), is both nonnegative and nondecreasing and 
is bounded above by A the Poisson arrival rate. 

The likelihood function for a nonstationary Poisson process with ti, ft, . . . , t„ as the times of 
occurrence of events is given by the joint density function 

(2.2) /r„ m m (t\, tz, . . ., t„ ; \(t)) = Pr (observing events at t t , h, . . . t„ ; k(t)) 

= [nM^]-exp(-l\(*)<k), 

where A(f) is the intensity function of the Poisson events. The first step in the problem under study 
involves finding a function A(0, t ^ which maximizes the likelihood function given by Equation 
(2.1) for fixed ti, t-i, . . ., t n under the condition that MO, t ** 0, is nonnegative and nondecreasing. 
The maximum likelihood estimate of M*)> t ^ Q satisfying these conditions has been obtained by Bos- 
well [1] as, 

'=0 if0*5f<*i 

< = min {Af y X(f *)} if **< t<t k+1 , k =1,2,. . ., (n - 1) 

= M < oo if t ^ t „, 




\(tk)— max min \-, ; ; \ 

UosH«3«nl («aT . . . -ran) 

ak = tk+i — t k , k = a, a+1,. . .,/8. 

It may be noted that in the absence of an upper bound M on the value of M'K the solution obtained 
will carry no meaning as (2.2) can be made arbitrarily large by setting k{t) — e > for t < t„ and setting 
k(t„) arbitrarily large. Therefore, let k(t) «£ M for some fixed positive number. 


The maximum likelihood estimate MO for the function MO,* 3* 0, maybe used to estimate k-G(t), 
t 3 s 0, from output observations of an M/G/°° system during some interval [0, T], T > t. This will give an 
estimate of A ■ G(t ) for te[0, T\. To obtain estimates for \ • G(t) for small t, the output process should 
be observed for small t. For large values of t, the output becomes a stationary Poisson process at rate A. 
For large values of t, G(t) is estimated as 1. If the input rate A is assumed to be known (this may be 
estimated from input data) and it is also assumed that the system starts empty, then the maximum like- 
lihood estimate of the service distribution function G{t) is given by: 

(2.5) G(0 = min[^,l]- 

In case it is desired to relieve the assumption that the system starts emtpy at t = 0, one must consider 
the first outputs as order statistics from G(t), given the number in the system at fc=0 possibly mixed 
with outputs which arrive after t = 0. 

The hidden service distribution G(t) for an Af/C/« system may also be estimated by peeking at 
the system only at times t t , t 2 , . . . t„ and observing the number in the system N(t\), . . ., N(t„). 
This sequence may also be used for maximum likelihood estimation because the number of customers 
in the M/G/<» service system is also a nonstationary Poisson process with intensity function X(l — G(t)) 
nonincreasing in t. The maximum likelihood estimator A(0 from (2.4) may be made into a maximum 
likelihood estimator of a nonincreasing function by reversing the max-min operation in (2.4). From this 
estimate of Ml — G(t)), t 2= 0, one can obtain an estimate of G(t), t 2= 0. 

Simulated output times of Af/Z)/ 00 and M\M\<x> service systems have been used in calculation of the 
maximum likelihood estimators. Comparison of these estimates to the true service distribution and to 
another estimator is made in section 4. 


The maximum likelihood estimator of K-G(t) is a step function with jumps at output times, 
7\ *£ r 2 *£ . . . ^T„. The first nonzero value or the estimate of G(t) occurs at or after 7\, giving no in- 
formation about G(t ) for t < 7\. This limitation may be removed by taking observations on N repeated 
runs of the service system starting empty. The ordered output times for all runs are used in calculat- 
ing the estimator of G(t). The maximum likelihood property of this estimator still exists and the 
estimator G,v(0 is given by, 

(3.3) G N (t) = mm[k(t)INk,l], 

where N is the number of runs. 

A lower bound on the expected time of the first output from Af runs is the expected first input time 
in W runs 1/A7V. Extreme value theory suggests that asymptotically the time of the first observations 
on G(t) will become smaller as the number of sums increases. Let D t j be the departure time of 
the ith customer in the 7th run where i—1, 2, . . . n,j=l, 2, . . . N. Dij=Tij + Sij where 7\, and 
Stj are arrival and service, respectively. The first departure over N runs will take place at time given 
by min {Ty + Sy}. By extreme value theory, min {7y + Sy} will have Weibull distribution asymptot- 

i &j i &j 

ically, Gumbel [3], no matter what may be the distribution function for the random variable (7y + Sy) 


provided (Tij + Sij) > 0. The expected value of a random variable x having Weibull distribution is 

given by 

(3.4) £(*) = <* -r(l + l//3), 

where a is the scale parameter and /3 is the shape parameter for the Weibull distribution. The scale 
parameter a can be estimated as the mth order statistic (m counted from bottom) for which 

(3.5) l-(m/Ar+l) = l/e= 1/2.718 .... 

As N increases m will increase which means the value of a, the scale parameter, will decrease. 
Thus the expected value given by Equation (3.4) will go to zero asymptotically for large values of N. 
In the context of the service system, this will mean that the expected time of first departure will asymp- 
totically decrease to the lower support of the distribution as the number of runs increase. In other 
words the mean of the minimum order statistic of a random variable is of the order of the quantile for 

which the probability value is I— — -J and thus will decrease to the smallest possible value of the 

random variable asymptotically with/V. 

A simple illustration can be given by considering service time to be a constant, to. The expected 
time of the first departure in N runs, each run with n observations is given by 

(3.6) Em\n{T ij + S ij } = E{min (Tij) + t } 


= £{min (Tij)} + t 
= (ll\Nn) + t , 

where A is the arrival rate of a Poisson arrival stream. 

It can be seen from (3.6) that the expected time for the first departure in case of constant service 
time to converges to the lower bound of the support of the service time distribution faster than 1/(N+ 1) 
as long as n > 1. 

The proof by Marshall and Proschan [4] of strong consistency of the maximum likelihood estimate 
of a distribution function under the assumption of increasing failure rate may be applied to show that 
the maximum likelihood estimator of an increasing distribution function G{t) is strongly consistent at 
the points of continuity; i.e., 

(3.1) G N (t)=G(t), 

with probability 1, for a sufficiently large number of repeated observations N on the service system out- 
put starting empty. This may be done because the failure rate function r(t) given as 


for an increasing failure rate distribution function F{t) corresponds to a nondecreasing intensity func- 
tion k(t) of a nonstationary Poisson process. In fact K(t) is the failure rate function of the distribution 
function of the event times conditional on previous event times. The maximum likelihood estimate of 
A(f) based on event times of a nonstationary, nondecreasing intensity Poisson process is the same as 
the maximum likelihood estimator of the failure rate function r(t) from a nondecreasing failure rate 

4.1 Brown's Estimator 

In Figure 1, the number of customers in the system N(t) is plotted against time t, Let the origin 
on the time axis be shifted to the right such that it coincides with the first output after the old origin 0. 

6j , 

n H* Z 4 -* 

5 - *t— z 3 *h 

*»-Z|-*H *»Z 2 ** 

« \ M TIME f 

-t Y 3 M 

4 Y 4 — -t 

^ "b w 

FIGURE 1. Number of units in the system N(t) at time V vs. timet. 

Yi, i= 1, 2, . . . , n is the time between the new origin 0' and the ith output point after the new origin. 
Zi is the time from Yi to the nearest input point prior to Yi. For a stationary input process and in- 
dependent identically distributed service times in steady state behavior, the Z, are independent 
and identically distributed. Let //(•) be the distribution function of Z,, £=1, 2, . . ., n and H n (') 
the empirical distribution function based on observations Z \,Z-z, ■ ■ -,Z n . 

Pr[Zi > *] = Pr[time back from ith output to last previous input > x] 

= (l-H(x)) 

= Pr [no input in the interval of length x fl service takes longer than x] 

= e-^'(\-G(x)) 


l-H(x) = e- k '(l-G(x)). 



Thus, an estimate of H{ • ), given by H„(x), may be used for estimating the service distribution 
function G(x): 


G(x) = l-e^(l-H„(x)). 

This estimate may not be nondecreasing. A nondecreasing estimate of G(x) is obtained by modifying 
Equation (4.1), 


G„(x) = max[0, j=i2 max | .^ ) {l-e Xzi (l-^„(Zi))}]. 

4.2 Numerical Results 

The maximum likelihood estimate of the hidden service distribution function G(x) was obtained 
for simulated operation of an M/M/°° system for various arrival rates X and service rates /a. The results 
shown in Figure 2 correspond to an arrival rate of \= 1 customer/ min and exponential service at rate 


09- N = 5,n = 50 



^ 06 










FIGURE 2. Maximum Likelihood Estimate (MLE). 

fx=0.1 customer/min. Simulation was carried out for five runs, each consisting of 50 observations. The 
estimated values G\(x) are plotted against output times T, for 1=1, 2, . . ., n. The empirical distri- 
bution function as well as an exponential distribution function for the service times are also plotted for 
the purpose of comparing the simulated results. It can be seen that the estimated distribution function 
is close to the empirical and the actual distribution function. Brown's estimator was simulated for an 
M/M/oo system in steady state with A = 1 customer/min. and exponential service at rate £(. = 0.5 customer/ 
min. Results are shown in Figure 3 and it is found that Brown's estimated distribution function is close 
the empirical as well as actual distribution function. 

Further simulations with different exponential service rates has shown that while Brown's method 
gives reasonable results for a system having service rate close to or larger than arrival rate the maximum 
likelihood estimator is useful for slow service rate systems having large numbers of customers in the 

























... J ^'\ ""EMPIRICAL 


n n 

i i 


1 1 1 

02 0.4 0.6 08 10 02 14 


Figure 3. Brown's estimator. 

18 20 

system. This contrasting behaviour of the two estimators may be used in order to obtain better results 
by using Brown's estimator in case of fast service and maximum likelihood estimator in case of slow 

Simulation of estimators was also performed for constant service times. The same remarks as 
above apply to the usefulness of the two estimators relative to service rate. It was also noted that the 
maximum likelihood estimator converged to the true unknown service time from above while Brown's 
estimator was less biased. 


[1] Boswell, M. T., "Estimation and Testing Trend in a Stochastic Process of Poisson Type," The 
Annals of Mathematical Statistics, 37, 1564-1573 (1966). 

[2] Brown, M., "An Estimation Problem in Af/G/<» Queues with Applications to Traffic," Technical 
Rept. No. 59, Department of Operations Research (Cornell University, Ithaca, New York, 1968). 

[3] Gumbel, E. J., Statistics of Extremes (Columbia University Press, New York, 1968). 

[4] Marshall, A. W. and F. Proschan, "Maximum Likelihood Estimation for Distributions with Mono- 
tone Failure Rate," The Annals of Mathematical Statistics 36, 69-77 (1965). 

[5] Mirasol, N. M., "The Output of an M/G/<» Queuing System is Poisson," Operations Research 11, 
282-284 (1963). 


Marcel F. Neuts * and Eugene Klimko 
Purdue University 


This paper deals with the stationary analysis of the finite, single server queue in dis- 
crete time. The following stationary distributions and other quantities of practical interest 
are investigated: (1) the joint density of the queue length and the residual service time, 
(2) the queue length distribution and its mean, (3) the distribution of the residual service 
time and its mean, (4) the distribution and the expected value of the number of customers 
lost per unit of time due to saturation of the waiting capacity, (5) the distribution and the 
mean of the waiting time, (6) the asymptotic distribution of the queue length following 

The latter distribution is particularly noteworthy, in view of the substantial difference 
which exists, in general, between the distributions of the queue lengths at arbitrary points 
of time and those immediately following departures. 


This paper is a direct sequel to [2], to which we refer for a detailed definition and for the assump- 
tions of the finite, discrete time queue. For easy reference, we only give a summary of the notation here. 
L\ Maximum number of customers allowed in the system at any time. All excess customers are 

lost and do not return. 
L-i Maximum duration of the service time of a single customer. 
Tj Probability that a service lasts for; units of time, 7= 1, . . ., L 2 . We assume without loss of 

generality that tl 2 > 0. Also n + . . . +r/. 2 = 1. 
K Maximum number of arrivals during a unit of time. It is assumed that K<L X . 
Pj Probability thaty customers arrive during a unit of time, j = 0, 1, . . ., K. We assume without 

loss of generality that po > 0. and p* > 0. Also po + . . • +Pa ; — 1. 
X n The number of customers in the system at time n+. 
Y n The number of time units until the customer in service at time n+ completes service. We note 

that O^Yn^Lz and that Y n = if and only if X n = 0. 

In [2], it was shown that the bivariate sequence { (X„, Y„), n 3= 0} is an irreducible, aperiodic Markov 
chain with state space { (0, 0)} U {(1, 2, . . .,Li)X(l, . . ., L 2 )}. Its transient behavior was dis- 
cussed and investigated numerically in [2]. In this paper we first discuss the stationary joint distribution 
of the queue length X„ and the residual service time Y n . 

*The research of this author was supported by the National Science Foundation, Contract No. GP 28650 


558 M. F. NEUTS AND E. K1JMK0 


We denote the stationary probabilities by P(i,j) for i= 1, . . . , L\ and / = 1, . . ., L%\ P(0, 0) 
is the stationary probability that the queue is empty. The stationary joint density of X„ and Y n is the 
unique solution to the following system of linear equations. 

(Da. P(0, 0) = p [P(l, 1) + P(0,0)]. 

for 1 ^i*£ K, 1 <y<I s -l. 

c. P(*\./) = x Pr-^(^y+l) + ';'•,.; /'('"- ^), 

for K + 1 «s i i *£ U - 1, 1 =£ j =S L 2 - 1. 

d. P(L t ,j) = P(i,,y+ 1) + 5; (1 - x p*) Wj - vJ+ i) + 'v,; p(£i, u\ 

v=\ Jt = 

for 1 ^y '^ L 2 -l. 

P(i, U) = r,. t f-^-P(l, 1) + V P*-, + i />(*, 1)1, 
11 - Po „„ 

for 1 *s i < A:. 


P(i*L 2 ) = r Lt X p,_„ +1 />(*,!), 

forK+1 <»<Li-l. 

P(L,,L 2 ) = r, 2 2 fl-Jpt)^-^!,!), 

„=1 v fe=o 

/-1 /.j 

h. P(0,0) + ]T 2P(i,j) = l. 

i=l j=l 

The system (1) contains L t L 2 + 1 independent linear equations in L1Z.2 + 1 unknowns. We shall show 
that its solution may be conveniently expressed in terms of the solution of a homogeneous system of 


L\ equations in L x unknowns. Moreover, the latter system has a particular structure which greatly 
simplifies its numerical solution. 

We denote by Pj the Li-tuple [P{l,j), . . ., P(L u j)] forj=l, . . ., L 2 . We also introduce the 
L\ X L\ matrices A and B defined as follows: 

Po P\ Pi 

po Pi 



Pk-i Pk 

Pk-2 Pk-i 

Pk-3 Pk-2 

. . 


. . 

. . 

. . 



1— po 1— po 1— Po 

PO Pi P2 

PO Pi 


Pk-2 Pk-i Pk 

Pk-3 Pk-2 Pk-i + Pk 

Pi Pz 

Po Pi 



1— Po 

Pk-i Pk 

Pk-2 Pk-i 

Pk-3 Pk-2 

1— Po— Pi— p 

1— Po 



Pk-2 Pk-i Pk 

Pk-3 Pk-2 Pk-i + Pk 

p 2 p 3 

pi p 2 

po Pi 


In terms of A and B, the equations (16 — #) may be written as 


(2) Pj^Pj+iA + rjri^, 1*/«I,-1, 

P L = nJPiB. 
The latter system is equivalent to the equations 

(3) Pj^P^rwf-J, l^j<L 2 -l, 



v= 1 

We now observe that both A and B are stochastic matrices, that A is upper triangular and that the 
matrix B has only one subdiagonal. We shall say, for brevity, that B is nearly upper triangular. Since 

r, + . . . +ri, 2 =l, and A is an upper triangular stochastic matrix, the matrix V r v A v ~ x is stochastic 

v= 1 

and upper triangular. The stochastic matrix B is irreducible, so that the matrix 

(4) Q=2r*4"- l B, 

is irreducible and stochastic. Finally it is easy to verify that Q is nearly upper triangular. 

The vector P/. 2 is therefore proportional to the vector of the stationary probabilities of the matrix Q. 
The nearly upper triangular form of the matrix Q makes the numerical computation of the vector 
Pi. 2 — up to a positive multiplicative constant — particularly simple. The vector Pl 2 is proportional to the 
vector (ti, ti, . . ., fc,,), whose components may be computed recursively as follows 

(5) «i = l, 

t 2 = (1— 9ll)92~l'' 

t k 

= q k ~t k _ 1 Uk-i(l—qk-i,k-i)—'2 t t v q v , k -j , 3=££s=L, 

It is easy to verify that none of the entries qk.k-i,2 ^ k^ L x vanish, so that by using the first equation 
in (2), the vectors Pj, j=l, . . ., L% — 1, may be computed up to a common, positive multiplicative 
constant. Equation (la) is then used to determine P(0, 0) up to the same multiplicative constant. This 
constant may finally be computed using Equation (lh). The stationary joint density of the queue length 
and the residual service time is therefore determined. 



The support of the stationary density {tVj} of the waiting time consists of the integers 0, 1, . . ., 
L x Lz. Clearly w o = P(0, 0) and for 1 «£./'«£ LiL 2 , the density may be written symbolically as the con- 
volution polynomial 

(6) {wj} = P(l,-)+P(2,')*{r v } + P(3,-)*{r v }^+. • . + P{Li, •)•{*}«*-», 

where {r„} is the density of the service time. 

The numerical computation of the uij, l^j^ L\L 2 , by using a convolution analogue of Horner's 
algorithm for polynomials was discussed in [2]. 


Since the waiting room is finite, it is possible that customers will be lost due to the waiting room 
being full at their arrival time. It is therefore of interest to know the stationary density {<£,} of the 
number of lost customers per unit of time. It has its support on the integers 0, 1, ... K and may be 
determined by the explicit expressions 

(7) <pj=2 Pk %P(L x -k+j,p), l^j^K, 

(pO=l-^ Ipj. 

Knowing the joint density discussed in section 2, the probabilities {<pj} are readily computed. 


The probabilities associated with the queue length at departure times, are primarily of interest 
in the analytic treatment of queues of M|G|1 type. Although they are frequently examined, their in- 
herent applied interest is limited. 

As we shall indicate below, the density of the queue length following departures may easily be 
obtained from auxiliary quantities which are computed in the process of evaluating the joint stationary 
density, discussed in section 2. In view of the importance ascribed to this density in the applied queueing 
literature, we decided to investigate its computational aspects. Note the very substantial difference 
which may exist between it and the stationary density of X n . 

The queue lengths following departures form an irreducible, aperiodic Markov chain with state 
space {0, 1, . . ., L\— 1}. Let us denote its transition probability matrix by T. Furthermore, let 6k(i, v) 
be the probability that in k consecutive units of time during which no departures occur, v customers 
join the queue, given that the queue length at the beginning of the first unit of time was i. 

The entries of T are then given by 

(8) To } = '2 i r k y t pK(l-po)- 1 k (h,j-h+l), forO<;<Ii-l, 

A=1 A=1 



Tu= £ r k O k (i,j-i+l), for 1 *£ i^j+ 1, 

A = 1 

7\j = 0, for i>j+ I. 

We note that the transition probability matrix T is nearly upper triangular. The stationary proba- 
bilities corresponding to T may be calculated by a simple recursion such as in Formula (5). In order 
to evaluate the entries of the matrix T, we first show that 

(9) e,(iJ-i+l) = (A k )i, i+l , forl^i*£L 1> 0*£y^L,-l, 

where A is the upper triangular matrix defined in section 2. 
For k—1, we find that 

(10) i (iJ-i+l)=p J -u. i , torO^j-i+l^KJ^Li-2, 


= 2 P»> for L x -K^i^ Li, j=Lt-l, 


= 0, for all other pairs (i,j), 

so that Equation (8) holds for k= 1. Furthermore 

(11) 6 k + l (i,j-i+\) = Y k (i,v-i+l)pj- p , 

v= maxTo, j-K) 

for 0«y=sL,-2, and 

'-. K 

dk + i(i, Li— i) = V 6 k {i,v — i) V p/,, 

P=T, -A' h = L t —v 

lor 1 =S i =S L|. When expressed in terms of the matrix /4, Formula (11) proves (9) inductively. 
The matrix T can be compactly written as 

(12) T=cf t r k A", 

where Cy=pj(] — />o) _1 for 1 =£y '^ K; Cj_i,,= 1, for 2 *£ i =£ Li, and Cy = 0, for all other pairs (i,j). 
The relation between the limiting distribution of the queue length following the nth departure 
and the stationary queue length distribution is noteworthy. A well-known theorem, from Reference 
[3], states that in a stable M \G\\ queue with single arrivals, the queue length at time t and the queue 
length following the rath departure have the same limiting distribution as t and ra respectively tend to 

An analogous result holds for 
provided that the probability of ; 
by an exact analogue of the argumenl 

In the case of group arrivals (K 
tions. Theorems which relate those 
processes, hut the resulting formi t 
offer as an illustration some numeri 

We considered a queue with L t 
Although the traffic intensity p for th 
shows that this queue converges very 

The limiting distribution of the > 
In contrast, the limiting distribution 
addition, we list a summary of the nui 
ary probability of at mosl k custoi 
following a departure from the system 

ussed by Dafermos and Neuts [1], 
I time is zero. This result is proved 
we shall omit the proof. 

en those two limiting distribu- 

using the theory of Markov renewal 

not pursue this topic here, but we 

i has rare arrivals of large groups of 

Po = 0.975, p>o = 0.025, r, = r 2 = 0.5. 
xamination of the transient behavior 
leparture has a mean equal to 32.2864. 
time n has a mean equal to 24.1752. In 
ttionary ctatributions. 77* is the station- 
ary probability of at most A: customers, 






The greater limiting probability 

paradoxical at a casual reading. A m 
be anticipated in stahle queues with 
be zero for long intervals of time becaus< 

flowing departures may appear to be 

owever that, on the contrary, this is to 

example, the queue length will typically 

». The averaging procedure involved in 

the stationary distribution of the queue lei n heavily favors the lower values of k. The 

limiting distribution of the queue length folio be rath departure effectively ignores the long idle 

periods and results primarily from t!i during the service of the large groups of 

customers. The high probabilitiei of k in this distribution are therefore not 

surprising. ... 

This example strikingly shows that ion of the queue features may be of 

limited practical value, even in very stable lost realizations of the queue length process in 

our example will exhibit very substantia' . reflected in the asymptotic distri- 


butions. The practical questions related to queues of this type can only be answered after analyzing 
their transient behavior. The exclusive concern with asymptotic results in "practical" discussions of 
queueing theory is therefore regrettable. 


In order to minimize both the computation time and the required memory storage, we took ad- 
vantage of the highly structured form of the matrices Q and T in Equations (4) and (11), respectively. 
The basic matrix is the upper triangular polynomial matrix Q* — {q*j} 

(13) Q*=^t rA v ~\ 

The rows of this matrix are similar in the sense that 

(14) qt,i+v = q£ v +i 

fori/=0,l,2, . . . L,— i — 1; i = 2,3, . . .,L,-1. 
Furthermore the matrix Q* is stochastic, so that 

(.5, -r,^-ff 

Therefore, the first row determines the entire matrix. This permits the storage of 0* using only L\ 
memory spaces, rather than the (Ljf + Li)/2 spaces required for an arbitrary upper triangular matrix. 
The resulting saving in memory space is substantial for large queues and in fact makes the analysis 
of queue lengths up to 800 feasible. Computation of the matrix Q* is performed by using Horner's 
method for the formation of polynomials, i.e., by recursive computation as follows 

(16) Q 1 *=(r^ + r f . 2 _,7) 

Q*n=Q*n->A + r L2 - n I n = 2, . . .,1,-1. 

Each of the successive matrices Q * is completely determined by its top row. The right-most elements 
are not needed and therefore are not computed. The top row entries of Qt are rapidly calculated by 
means of the formulas 

mln (K,j) 

(17) 0# B+1) =yprf i ! ( f )+ n<.-«P* iorj^k 

i = 

<7u (n+1) =2 Pi&V, for K<j^L u 

1 = 

The matrix Q has the form 



qu Q12 9i3 qu 

921 ?22 923 924 

921 922 923 

q 2 i 922 

9i,t,-4 9i,*-i-3 9i,t,-i 

92,/-,-3 92,L,-2 92,t,-l 

92,Li-4 92,f,,-3 93,L,-1 

92,L,-5 92,t,-4 94,ti-l 



9^1-2, L,-l 9i,-2, t, 


9^i-i, /-i-i 9t,-i,t, 

qL t ,L,-l qLj.Lt 

where the third through the last rows, except for the last two columns, are essentially repetitions of the 
second row. The last column is determined by the condition that the rows sum to one. We therefore 
need to compute and store only the first and second rows and the (Li — l)-st column. This requires 
3Li— 4 memory cells for the storage of the Q matrix rather than the L t + (Li + 2) (L t — l)/2 required 
for an arbitrary nearly upper triangular matrix. The top row elements of Q are given by 


9u : 

l-po 9n 


for j^K 

2 P^ti-nv 


for K<j<U-\ 

I ti-l \ K 

9i,/.,-i = Po ( !~ X 9 u ) + S Pilt^-r, 

* j=l ' i=l 

the second row elements of the () matrix are 



92i= X P<i-i' forl*£,=£Li-2; 


and the (Li — l)-st column elements are calculated by using 


9i,/.i-i — Po 


min (K,Li-i) 




for 2*Si*£ U. 

The stationary probabilities of the Q matrix were determined using Formula (5) and its compact 
representation by Formulas (18)-(20). For this purpose, a subroutine called STAPROB was written. The 
resulting stationary probability vector was identified temporarily with the vector Pl 2 . The vectors 


7Y 2 , ■ ■ • , P\ were successively obta this computation essentially only the top 

row of the matrix A is needed. The nuil 

(21) P(i L z ), 

for./=/.2-l !• 

Finally /'(0, 0) is con tali'zation condition 


The waiting-time distrib I ubroutine called WAIT. This 

subroutine was adapted from the >es where L,L 2 is large, one may wish 

to print only the percentage points of th A routine to do this was also written. 

The computational procedun leparture is similar to that for the 

stationary queue length distribution. 



is first computed and then the matrix Tis -nled in a manner similar to that of the 

matrix Q. Only a modicum of additii solved. The stationary distribution is then 

calculated by the suhroutine STAPROB. 


In addition to testing the program ' c compared the stationary probabilities 

with the transient probabilities after 60 r were obtained by the methods developed 

in [2]. 

Computational Experience 

Practical limits on the pro! memory requirements. The available 

memory space of J50K octal oximately. This permits, for instance, 

queue lengths of size 800 with service 5 points. For problems of this magnitude 

the computation time was a limiti n of the waitingtime distribution. We 

ran examples, both with and without ngtime. The central processing times 

on the CDC 6500 at Purdue Univer shown in Table 2. 7'i and T 2 are the 

actual program running times in seconds (withoc and loading times), respectively, with and 

without the computation of the waitingtime di ». For the example with Li = 800, Z. 2 = 25, 
/C = 4, the time 7\ was in excess of 3,000 see. I the computations were not completed even then. 
In all the examples, we used the same arrival distribution po = 0.8, pt — ih — P3 = p4 — 0.05. The service 

time distribution for the first three examples wai ,==0.05, and r 5 — 0.175. In the 

last example, the service time distributi ometric with p = 0.5 and the residual 
probability was added to / ?:,. 

Table 2 






T 2 



















> 3,071.032 



Large discrete, single server queues in the stationary phase may be analyzed numerically. As we 
have shown, most queue features of interest, with the possible exception of the stationary waiting-time 
distribution, can be computed without the use of excessive processing times. This should be contrasted 
with simulation methods which are inherently ill-suited for the study of the stationary phase. 

The prohibitive processing times required for the waiting-time distribution in large queues, raise 
the interesting question of how to evolve efficient numerical procedures for the evaluation of expressions 
of the general type 

which appear frequently in stochastic models of varied applied interest. 

Finally, the example discussed in section 5, shows that in queues exhibiting large fluctuations, it 
may be hazardous to base conclusions on a single stationary distribution. In such cases one should study 
the transient behavior, whenever possible. 

For further information on the algorithms discussed in this paper, one may contact either of the 
authors at the Department of Statistics, Purdue University, West Lafayette, Ind. 47907. 


[1] Dafermos, S. and M. F. Neuts, "A Single Server Queue in Discrete Time," Cahiers du Centre de 

Recherche Operationnelle 13, 23-40 (1971). 
[2] Neuts, M. F., "The Single Server Queue in Discrete Time — Numerical Analysis I," Nav. Res. Log. 

Quarterly, 20, 297-304 (1973). 
[3] Takacs, L., Introduction to the Theory of Queues (Oxford University Press, New York, 1962). 


James K. Hartman 

Naval Postgraduate School 
Monterey, California 


When applied to a problem which has more than one local optimal solution, most non- 
linear programming algorithms will terminate with the first local solution found. Several 
methods have been suggested for extending the search to find the global optimum of such a 
nonlinear program. In this report we present the results of some numerical experiments 
designed to compare the performance of various strategies for finding the global solution. 


It is frequently the case in applied optimization studies that an algorithm which is known to con- 
verge to a global optimal solution under certain conditions (such as convexity) will be applied to a prob- 
lem which does not satisfy these conditions. In particular, optimization problems which are suspected 
of having several local optima in addition to the global optimum are often solved using algorithms 
which will stop and indicate a solution whenever any local optimum is reached. In such cases a useful 
strategy is to repeat the solution process several times starting from different initial points to avoid 
accepting a solution which is only a local optimum. This is probably the most frequently suggested 
strategy for avoiding local solutions. 

There are also other strategies for avoiding the local solutions in favor of the global optimum. This 
paper describes some numerical experiments which were done to compare the performance of several 
strategies for organizing such a global optimization. 


In order to develop and test strategies for avoiding local solutions it is necessary to specify a class 
of optimization problems to be considered. This paper will concentrate on the "essentially uncon- 
strained" nonlinear programming problem 

(1) minimize f(x) 

subject to xeScE", 

where the local and global optimal solutions to (1) are known to occur in the interior of the set S. In 
such a problem the feasible region S determines a domain to be searched for solutions, but the bound- 
aries of S do not determine the solutions. In this sense problem (1) can be considered "essentially 
unconstrained." The simplest way to specify the set S is to place upper and lower bounds on each 
variable. Since each of the strategies to be considered will involve random selections of x, it is necessary 
to confine the search to a bounded region S. In addition, search strategies S5 and S6 will partition S 
into smaller regions; these two strategies can only be conveniently described for S determined by 
upper and lower bounds on the variables. 


570 J- K. HARTMAN 

''Essentially unconstrained" problems arise frequently as the "unconstrained" subproblems in 
interior penalty function algorithms such as the Sequential Unconstrained Minimization Technique of 
Fiacco and McCormick |3J. In the SUMT method, if the original nonlinear program is not a convex 
program, then the subproblem (1) may have local solutions which are distinct from the global solution. 

For problems like (1) a local optimal solution can be obtained by applying any of the efficient 
unconstrained descent algorithms (such as the Davidon-Fletcher-Powell method) to minimize the func- 
tion /(.x) while being careful not to penetrate the boundary of S. We shall now consider several strate- 
gies which try to ensure that the local solution we finally accept is, in fact, a global minimum. 


Six different strategies for organizing a global optimization are compared in this paper. These are 
briefly described below with references to more complete descriptions when they exist. 
STRATEGY SI (From the folklore): 

a. Set k= 1. 

b. Let x k be a vector chosen at random in the search region S. Starting at x k perform an uncon- 
strained minimization search on the function/^) terminating at the local minimum x k *. 

c. Replace k with A: + 1 and go to step b. At each stage retain the best local solution obtained to 

SI is the strategy suggested in section I. Intuitively the problem with this strategy is that it may re- 
peatedly search to the same local minimum if the starting points x k happen to be chosen within the 
"range of attraction" of that local minimum. The next three strategies attempt to solve this problem. 

a. Set k=\. Let/* be the objective function value for the best local solution so far obtained. 
Initially /*= + °°. 

b. Randomly select points xeS until one is found with f(x) </*. Call this point x k . 

c. Starting at x k perform an unconstrained minimization search terminating at a new local mini- 
mum x k *. 

d. Set/* —f(x k * ) , replace k with k + 1 , and go to step b. 

In S2 a minimization (step c) is initiated at x k only Hf{x k ) is smaller than the best solution/* found 
to date. Hence, each successive minimization gives a new local minimum which is better than any found 
so far. The same local minimum cannot be located twice. It is, however, much more difficult to deter- 
mine the starting points x k for strategy S2 than for SI. 
STRATEGY S3 (Bocharov [1]): 

a. Choose x 1 randomly in S. Set k=l. 

b. Starting from x k perform an unconstrained minimization terminating at the local minimum 

c. Choose a direction vector d k €E" at random and consider/^* * + ad A ) as the positive scalar 
a increases. Moving away from x k * in direction d k , the funtion/must initially increase (since x k * 
is a local minimum). Continue to increase a until/begins to decrease when a=a k . 

d. Let x k+1 = x k * + a k d k , replace k with k+ 1, and go to step b. 
STRATEGY S4 (Bocharov [1]): 

S4 is the same as S3 except that in step c, instead of choosing the direction at random, d k is 


chosen to be the direction of overall progress from the most recent minimization 

(2) d k = x k *-x k . 

Both S3 and S4 attempt to prevent repeated minimization to the same local optimum by moving out 
of the region of attraction of the most recent local solution before starting the next minimization. By 
continuing in the direction (2), Strategy S4 hopes to also avoid local minima detected before the most 
recent minimum. 

Strategies S5 and S6 are considerably different from the first four methods. While S1-S4 attempt to 
choose good starting points for repeated local minimizations, S5 and S6 attempt to gain information 
about the entire search region S, gradually concentrating their attention on portions of S which are, in 
some sense, "likely" to contain the global minimum. S5 and S6 are most easily described for problems 
where S is determined by lower and upper bounds on each variable: 

S={xeE n \lj^x i ^L i , i=l, . . .,n}. 

For ease of presentation we will restrict our attention to such problems. 
STRATEGY S5 (Piecewise Coordinate Projection — Zakharov [5]): 

a. Set up on initially empty list of points, and let S = {xeE"\li ; =£ %\ i =£ L,, i=l, . . ., n} be the 
"remaining feasible region." Let S = S initially. 

b. Randomly choose N points x k eS, compute/^* 7, ) for each, and adjoin them to the list. 

c. For each coordinate x, of x(i=l, . . ., n) separate the remaining feasible interval [/,-, L,] 
into m equal subintervals. Let Xij—{x k in the list whose ith component is in the _/th subinterval of 
[/,-, Li]} = {x k \(j-l)(Li-l i )lm^x k -h<j(L i -h)lm} for i=l, . . ., n and./=l, . . ., m. Then 
Xn, Xii, . . ., Xi, n describe the projection of the list of points x k into the m subintervals of the ith 
coordinate axis. 

d. By considering {f(x k )\x k eX;j}(i=l, . . ., n; j=l, . . ., m) select the subinterval set X* 
which is considered most likely to contain the global minimum. Briefly, this is done by selecting the 
subinterval set for which the average functional value is smallest, being careful to avoid choices 
based on insufficient information (for more details see Zakhorov [5]). 

e. By redefining /., and L„ delete the subinterval sets X S j (J=l, . . ., m;j ¥^ t) from the remaining 
feasible region. Delete each point x k in the list whose sth coordinate is in a deleted subinterval X S j. 
Go to step b. 

As the remaining feasible region S gradually shrinks, the global minimum will be more and more 
closely bracketed. The problem with this method is that the most promising subinterval must be 
determined on the basis of the sample of points x" chosen so far. There is always a chance that a sub- 
interval chosen for deletion will, in fact, contain the global minimum solution, and once it is deleted 
it can never be recovered. 

Strategy S6 attempts to solve this problem by retaining the entire region S throughout and using 
a probabilistic allocation device to concentrate attention on areas in S which are most promising. This 
algorithm is new and is still under development. Initial results show some promise, but considerable 
improvement is still necessary. 


STRATEGY S6 (Coordinatewise Allocation): 

a. Define a marginal probability distribution function <I>, on the feasible interval [/,-, £,,■] of each 
coordinate axis i=l, . . . , n. In the absence of other information, a uniform distribution seems rea- 
sonable for the initial distribution. 

b. Randomly choose TV points x k eS and compute/^) for each. The probability distribution func- 
tions <Pi govern these choices in that the ith component x k of x k is chosen as a random sample point 
from the distribution <!>,•. Thus, the 4>; determine the allocation of trial points to various regions in S. 

c. Based on the results of the trials to date, modify the <P, to increase the allocation of future 
points to regions considered likely to contain the global minimum. Go to step b. 

Strategy S6 can have many realizations depending on the method of handling step c. In the version of 
S6 reported in this paper, step c is performed as following for each coordinate i= 1, . . . ,n. 

1. The feasible interval [/,-, Lj] is split into m subintervals. 

2. A "success" is defined as a value of f(x k ) in the bottom 25 percent of aUf(x k ) values, and the 
ratios ry of the number of successes in subintervaly of coordinate i to the total number of points in sub- 
interval j are computed for all i and j. 

3. The modified probability for subinterval j of coordinate i is given by py = ry/ ^ ry the normalized 
success ratio. 

Several improvements on this allocation scheme are being considered for future testing. 

In early tests it became apparent that performance of the various strategies fluctuated considerably, 
depending on the particular test problem under investigation. For example, relative to the other strat- 
egies, S2 performed spectacularly on some problems but miserably on others. On closer examination it 
was found that S2 did well on problems for which the global/value was significantly lower than the local 
minima and for which the global region of attraction was quite large; that is, on problems which were 
rather easy to solve. This suggests the need for a benchmark strategy to be used for assessing problem 
difficulty. The benchmark strategy should have as little structure as possible. We have chosen to use 
the pure random search method for this purpose. 

STRATEGY SO (Pure Random -Brooks [2]): 

a. Set *=1. 

b. Randomly select x k eS. Evaluate f(x k ). 

c. Replace k with k + 1. Go to step b. At each stage retain the best/ value found to date. 

This strategy may be regarded as a benchmark method since it makes no attempt to take advantage of 
the information gathered at previous stages. In this sense it is probably the most primitive strategy 
possible. We can use SO in two ways: 

1. If a strategy does not do considerably better than SO, it should be discarded. 

2. If a test problem is such that SO can solve it nearly as well as the other strategies, then the prob- 
lem is not very difficult and probably is not useful for discriminating among strategies. 


A number of computational experiments were performed to compare the various strategies pre- 
sented above. For each of the test functions employed, each strategy was run 30 times with different 


random number sequences. A run was allowed to continue until the algorithm had required 1,000 
evaluations of the objective function/(ac). 

Test problems with predictable local and global solutions were constructed using the objective 


This function consists of the superposition of m modes, where mode j has depth CjeE 1 , position PjtE", 
and shape and width determined by the nX n negative definite matrix Aj, Particular test functions were 
obtained by choosing the parameters Cj and Pj from a random number table. Aj was chosen to ensure that 
the m modes were narrow enough that they did not completely merge into one another. 

Strategies SI through S4 require an unconstrained minimizer. Since the purpose of the study is to 
compare global strategies, a minimizer is desired which uses the same information as is available to the 
other strategies — function values, but not derivatives. Powell's derivative free method was selected [4]. 


The computational results obtained are summarized in Tables 1 and 2. Table 1 gives characteristics 
of the test problems used. Table 2 lists for each problem and for each strategy the best/value obtained 
after 200, 500, and 1 ,000 function evaluations. Each value is the average of the 30 trials conducted for that 
problem and strategy. The percentage of the 30 trials which did not locate the global minimum after 
1,000 function evaluations is also given in Table 2. It is difficult to obtain a single measure of perform- 
ance for this kind of problem since we must balance speed of convergence against the chance that the 
global solution will be missed entirely. 

Table 1. Characteristics of Test Problems 


Number of 

Number of 

Value of global 









































Test Results 





































- 9.9 





















- 8.9 












































- 12.7 






























































































































ome general conclusions: 
' very .challenging since SO did nearly as well as most other 

hut frequently stops short of the global solution — it 


3. In general, SI, S3, and S4 perform about the same and better than the other strategies. 

4. S5 and S6 exhibit slow initial convergence. Both frequently tend to concentrate the search effort 
around a good local minimum, which is not global. 

5. On difficult problems even the best of these methods will frequently fail to locate the global 

It is also interesting to examine the entire graph of the number of function evaulations versus the 
best function value obtained for each strategy. These curves are shown for test function H in Figure 1. 
The results for function H are representative of those obtained for the other functions and serve to 
emphasize conclusions 2, 3, and 4, above. 

Note on this graph that S5 and S6 display a consistent decrease at an initial rate which is similar 
to that of the better strategies S3 and S4. However, since they start much higher on the graph, S5 
and S6 never catch up. This is inherent in the methods. Given any starting point x k , S3 and S4 imme- 
diately search to a local minimum, and thus quickly get a fairly low objective function value. Starting 
from the same initial point, S5 and S6 merely note the objective value and proceed to check other points, 
doing a global survey instead of a local minimization. Thus, in the initial stages, S5 and S6 are essen- 
tially identical to SO, pure random search. It is only after considerable information has been accumu- 
lated that these methods can concentrate their attention on promising search areas. 




-6.0 - 

-7.0 ■ ■ 


-9.0 - 




H — 









1 1 1 1 

- S2 





°°ooooooo S4 



100 200 300 400 500 600 700 800 900 1000 

FIGURE 1. Performance of strategies on test function H (average of 30 trials for each strategy). 

576 J- K. HARTMAN 

A comparison of SO, Si, S2, S3, and S4 is also interesting. In general, it seems that in these strate- 
gies which alternate cycles of random searching with unconstrained minimizations, the best results 
are obtained by the methods which do the least random searching. Thus, SO is purely random search, 
and its performance is the worst. S2 requires several (perhaps many) random evaluations before each 
minimization to find a point better than the current best local solution, and its performance is second 
worst. Strategy SI selects one random point x k before each minimization while S3 selects one random 
direction d k . Their performance is similar and almost as good as that of S4 which makes no random 
selections between minimizations. This strongly suggests that an improved strategy will consist of 
frequent minimizations coupled with an improved way of selecting starting points which are promising 
and which also sample the entire region. 

In conclusion, it is appropriate to note that these six methods do not come near to exhausting the 
possible techniques for avoiding local solutions. Methods which are hybrids of these and entirely new 
methods should be tested. In particular, we hope to develop an algorithm which allocates unconstrained 
minimizations to various regions similar to the way strategy S6 allocates the individual points x k . 
Such a method would combine the rapid local optimizing power of the minimization method with a 
global analysis of the feasible region. 


1 1 1 Bocharov, N. and A. A. Feldbaum, "An Automatic Optimizer for the Search of the Smallest of 
Several Minima," Automation and Remote Control Vol. 23, No. 3 (1962). 

[2] Brooks, S. H., "A Discussion of Random Methods for Seeking Maxima," Operations Research 6, 
244 (1958). 

[3] Fiacco, A. V. and G. P. McCormick, Nonlinear Programming: Sequential Unconstrained Minimi- 
zation Techniques, John Wiley and Sons, Inc., New York, (1968). 

[4] Powell, M. J. D., "An Efficient Method for Finding the Minimum of a Function of Several Variables 
Without Calculating Derivatives," Computer Journal 7, 155 (1964). 

[5] Zakharov, V. V., "A Random Search Method," Engineering Cybernetics, 2, 26 (1969).. 




B. Mond 

La Trobe University 
Bundoora, Melbourne, Australia 

B. D. Craven 

University of Melbourne 
Melbourne, Australia 


Consider the fractional programming problem with linear constraints. Problem 1 (PI): 

(1) Maximize f(x)lg(x) 
Subject to 

(2) Ax^b 

(3) x^O. 

It is assumed that the problem is regular, i.e., that the constraint set is nonempty and bounded and that 
/and g do not simultaneously become zero. 

There has been a great deal of interest in various special cases of PI. In particular, if/ and g are 
linear, Charnes and Cooper [1] showed that optimal solutions can be determined from optimal solutions 
of two associated linear programming problems. Charnes and Cooper's result was extended to the ratio 
of two quadratic functions by Swarup [3]. He considered Pi with / and g quadratic, and showed that 
an optimal solution, if it exists, can be obtained from the solutions of two associated quadratic program- 
ming problems, each with linear constraints and one quadratic constraint. Sharma [2] considered 
Pi with / and g polynomials. He showed that an optimal solution, if it exists, can be obtained from the 
solutions of two associated programming problems where the objective function is a polynomial and 
the constraints are all linear except for one polynomial constraint. 

Here we consider a much wider class of functionals/and g, and obtain a theorem that includes as 
special cases the corresponding results of [1], [2], and [3]. 


AeR 1 "*", xeR n , beR'", teR, /and g are mappings from R" into R. (f> denotes a monotone strictly 
increasing function from R into R, with (f>(t ) > for t > 0. 



For a specified function (/>, define the functions F and G, for real positive t and yeR", by 

(4) F(y, t)=f(ylt)<f>(t) and G{y, t)=g(ytt)4>(t). 

(5) F(y, 0) = lim F(y, t) and G(y, 0) = lim G(y, t) 

«-»o «-»o 

whenever these limits exist. Assume that G(0, 0) =0 whenever it exists. 


Let us introduce the transformation y=tx, where for specified function <b and nonzero constant 
Ac/?, we require 

(6) G(y,*) = A. 

On multiplying numerator and denominator of (1) by 4>(t), and using (4) and (6), we obtain the asso- 
ciated nonlinear programming problem. Problem 2 (P2): 

Maximize F(y, t), 
Subject to 

(7) Ay-bt^0 

(8) G(y,t) = A 

(9) y§0 

(10) (§0 

LEMMA: If the point (y, t) satisfies the constraints of P2, then t > 0. 

REMARK: This only requires proof if G(y, 0) is defined, by (5). This is automatically the case 
if /is affine and <\>{t) = t. 

By (8), the point (0, 0) is not feasible for P2. 

PROOF: Assume that the point (y, 0) is feasible for P2; then y^O. Let x be feasible for PL 
Since the constraints are linear, x + ky is feasible for Pi for any positive A:, contradicting the bounded- 
ness of the constraint set of PL 

THEOREM 1: If (i) < sgn A = sgn g(x*) for an optimal solution x* of Pi, and (ii) (y*, t*) is an 
optimal solution of P2, then y*/t* is an optimal solution of PL 

PROOF: Assume that the theorem is false, i.e., assume that there exists an optimal x* such that 

(11) f(x*)lg(x*) >f(y*lt*)lg(y*lt*). 

By condition (i), 

g(x*) - 0A for some 6 > 0. 


Consider t = <t>-*(H0), and y=<f>- 1 (l/b)x*. Then (j>(t)g(x*) = G(y, 
(10), and also (7) since the constraint is linear. Thus (v. /) is a I 



f(x*)lg(x*) = 4>{t)f(x*)lU >(*) {x*)]=F(y,t)l 

f(y*lt*)lg(y*lt*) )f(y*lt*)l[<Ht*)gi 



Hence, for feasible (y,t), (11), (12), and (13) show that F(y, t) > 
tion that (y*,t*) is optimal for P2. 

If sgn g(x*) < 0, for x* an optimal solution of PI, then repla 
tional is unaltered and for the new denominator we have — g(x*) > 0. 

Thus, if PI has a solution, it can be obtained by solving, foi 
the two nonlinear programming problems. Problem 3 (P3): 

■ ■ ■ ■ 

Subject to 

Maximize F(y\ t) 


and Problem 4 (P4): 

Subject to 

G(y,t) = l 


Maximize ~F-{y, t) 


G(y,0 = l 


If/ and g are linear and 4>(t) — t, then our theorem gives thi 
If/and g are quadratic functions and 4>{t) = t 2 , we obtain the result 
nomials of degree m and <£>(') = t m , we obtain the result of Sharrna [2]. 


If / and g are homogeneous of degree k, and <b(t) = t k then F(y, t)—f(y), G(y, t) = g(y), and 
P2 takes a simple form. An example is 

f(x) = d t x+(x t Cxyi\ 

where C is a positive semidefinite matrix. 


As noted in [2] and [3], even if G(y, t) is a convex function of y and t, the constraint set of P3 is 
not necessarily convex. Instead of P3, therefore, it is sometimes more convenient to deal with the fol- 
lowing Problem 3' (P3'): 

Maximize F(y, t) 

Subject to Ay—bt^0 

G(y, t) =§ 1 


If G(y, t) is a convex function of the vector variable (y, t) then the constraint set of P3' is convex. 

It should be noted that even if g(x) is convex with respect to x, G(y, t) need not be convex with 
respect to the vector variable (y, t). As an example, consider g(x) =x'Cx — k where C is a positive 
semi-definite symmetric matrix and A: is a positive scalar. Thus g(x) is convex with respect to*. Taking 
<b(t) = t 2 , G(y, t)=y'Cy—kt 2 which is not convex. 

If (y*, t*) is optimal for P3', t* > 0, and G(y*, t*) = 0, then max Pi may be °°, since x* = y*/t* 
is feasible for PI, and g(x*) = 0. If G(y*,t*) =Ai, where 0< Ai < oo, then (y*,t*) is also optimal for 
P2 with A = Ai, so Theorem 1 applies. However, the optimum of P3' can occur at (y, t ) = (0, 0), 
which does not correspond to an optimum of Pi. For example, if Pi is the program (for a real variable 

— x — 3 
Pi: Maximize f{x) = — —r— subject to x ^ 2 and x i? 0, then taking <f)(t) = t, the corresponding 
jc~t 1 

P3' is: 

P3': Maximize— y — 3* subject to y+t Sl,yg0,t50, y — 2t ^ 0. The maximum for P3' then occurs 
at (y, t) = (0, 0) ; but the maximum for Pi occurs at x = 2. 

Similarly, instead of (P4), it might be more convenient to consider Problem 4' (P4') 

Maximize— F{y, t) 
Subject to 





If G is concave in the vector variable (y, t), the constraint set of P4' is convex. 


[1] Charnes, A. and W. W. Cooper, "Programming With Linear Fractional Functionals," Nav. Res. 

Log. Quart. 9, 181-186 (1962). 
[2] Sharma, I. C, "Feasible Direction Approach to Fractional Programming Problems," Opsearch 4, 

61-72 (1967). 
[3] Swarup, K., "Programming with Quadratic Fractional Functionals," Opsearch 2, 23-30 (1965). 

Mathematical Models of Target Coverage and Missile Allocation 

The Military Operations Research Society announces that it now has copies of its first monograph, 
"Mathematical Models of Target Coverage and Missile Allocation" by A. Ross Eckler and Stefan A. 
Burr, available for sale. The book may be purchased for $7.50 postpaid by contacting: MORS, 101 
South Whiting St., Alexandria, Va. 22304. 

The monograph presents a comprehensive summary of analytical models primarily used for 
problems in strategic defense but applicable to a wide variety of more generalized resource allocation 
problems. The topics discussed include models of defended point targets, circular targets, gaussian 
targets, generalized area targets, groups of identical targets, and nonidentical targets. Offense and 
defense strategies are examined and under alternative assumptions of information available to both 
sides. An extensive bibliography is included. 




The NAVAL RESEARCH LOGISTICS QUARTERLY is devoted to the dissemination of 
scientific information in logistics and will publish research and expository papers, including those 
in certain areas of mathematics, statistics, and economics, relevant to the over-all effort to improve 
the efficiency and effectiveness of logistics operations. 

Manuscripts and other items for publication should be sent to The Managing Editor, NAVAL 
RESEARCH LOGISTICS QUARTERLY, Office of Naval Research, Arlington, Va. 22217. 
Each manuscript which is considered to be suitable material for the QUARTERLY is sent to one 
or more referees. 

Manuscripts submitted for publication should be typewritten, double-spaced, and the author 
should retain a copy. Refereeing may be expedited if an extra copy of the manuscript is submitted 
with the original. 

A short abstract (not over 400 words) should accompany each manuscript. This will appear 
at the head of the published paper in the QUARTERLY. 

There is no authorization for compensation to authors for papers which have been accepted 
for publication. Authors will receive 250 reprints of their published papers. 

Readers are invited to submit to the Managing Editor items of general interest in the field 
of logistics, for possible publication in the NEWS AND MEMORANDA or NOTES sections 
of the QUARTERLY. 




VOL. 20, NO. 3 

NAVSO P-1278 



Sequential Determination of Inspection Epochs for Reliability 
Systems with General Lifetime Distributions 

An Empirical Bayes Estimator for the Scale Parameter of the 
Two-Parameter Weibull Distribution 

Optimal Allocation of Unreliable Components for Maximizing 
Expected Profit Over Time 

A Continuous Submarine Versus Submarine Game 

Total Optimality of Incrementally Optimal Allocations 

An Approach to the Allocation of Common Costs of Multi- 
Mission Systems 

An Explicit General Solution in Linear Fractional Programming 

Using Decomposition in Integer Programming 

Numerical Treatment of a Class of Semi-Infinite Programming 

Min/Max Bounds for Dynamic Network Flows 

Production-Allocation Scheduling and Capacity Expansion 
Using Network Flows Under Uncertainty 

Concave Minimization over a Convex Polyhedron 

Estimation of a Hidden Service Distribution of an M/G/oo 

The Single Server Queue in Discrete Time- Numerical Analysis 

Some Experiments in Global Optimization 

A Note on Mathematical Programming with Fractional Objec- 
tive Functions 

News and Memoranda 

S. ZACKS, 377 




L. D. STONE 419 











Arlington, Va. 22217